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MOLECULAR TYPING OF GROUP B STREPTOCOCCI 



Field of the invention 

The present invention relates to molecular methods of typing group B 
5 streptococci, as well as polynucleotides useful in such methods. 

Background to the invention 

Group B streptococcus (GBS) - Streptococcus agalactiae - is the 
commonest cause of neonatal and obstetric sepsis and an increasingly important 

10 cause of septicaemia in the elderly and immunocompromised patients. The 
incidence of neonatal GBS sepsis has been reduced in recent years by the use of 
intrapartum antibiotic prophylaxis, but there are many problems with this 
approach. In future, vaccination is likely to be preferred and there has been 
considerable progress in development of conjugate polysaccharide GBS 

15 vaccines. 

Before the introduction of conjugate vaccines, extensive epidemiological 
and other related studies will be required to assess, not only the burden of 
disease, but also the distribution of GBS types (including capsular polysaccharide 
gene serotypes, serosubtypes; protein antigen gene subtypes; mobile genetic 

20 element subtypes) to determine the optimal formulation of vaccine antigens. Type 
distribution based on one geographic location or small numbers of patients may 
not be generally applicable. Continued monitoring will be necessary to assess the 
suitability of combinations of GBS vaccine antigens for different target 
populations in different geographic locations. 

25 Nine capsular polysaccharide GBS serotypes have been described 

(Harrison et al., 1998; Hickman et al., 1999). Various serotyping methods have 
been used, including immuno-precipitation (Wilkinson and Moody, 1969), enzyme 
immunoassay (Holm and Hakansson, 1988), coagglutination (Hakansson et al., 
1992), counter-immunoelectrophoresis, and capillary precipitation (Triscott and 

30 Davies, 1979), latex agglutination (Zuerlein et al., 1991), fluorescence microscopy 
(Cropp et al., 1974) and inhibition-ELISA (Arakere et al., 1999). These methods 
are labour-intensive and require high-titered serotype-specific antisera, which are 
expensive and difficult to make and commercially available for only six serotypes 
- la to V (Arakere et al., 1999). Molecular genotyping methods, such as pulsed- 

35 field gel electrophoresis (Rolland et al., 1999), restriction endonuclease analysis 
(Nagano et al., 1991) are useful for epidemiological studies but do not generally 
identify serotypes. Consequently, there is a need for a reliable molecular method 
for GBS serotype identification. 



WO 03/025216 



2 



PCT/AU02/01281 



Summary of the invention 

We have identified specific regions within the genome of group B 
streptococci of inter-type sequence heterogeneity that can be used to distinguish 
different types (including capsular polysaccharide gene serotypes and 
serosubtypes; protein antigen gene subtypes; and mobile genetic element 
subtypes). We have shown that molecular methods that detect these sequence 
heterogeneities can be used to accurately distinguish and type group B 
streptococci. 

Accordingly in a first aspect the present invention provides a method of 
typing a group B streptococcal bacterium which method comprises analysing the 
nucleotide sequence of one or more regions within the cpsD, cpsE, cpsF, cpsG, 
cpsl/M genes of said bacterium, said region(s) comprising one or more 
nucleotides whose sequence varies between types. 

In particular, the nucleotide sequence may be analysed for one or more 
positions corresponding to positions 62, 78-86, 138, 139, 144, 198, 204, 211, 
281, 240, 249, 300, 321, 419, 429, 437, 457, 466, 486, 602, 606, 627, 636, 645, 
803^ 971, 1026, 1044, 1173, 1194, 1251, 1278, 1413, 1495, 1500, 1501, 1512, 
1518, 1527, 1595, 1611, 1620, 1627, 1629, 1655, 1832, 1856, 1866, 1871, 1892, 
1971 , 2026, 2088, 2134, 2187 and 2196 as shown in Figure 1. 

Preferably at least one region is within a sequence delineated by the 3' 
1 36 bases of the cpsE gene and the 5' 218 bases of the cpsG gene of the cpsE- 
cpsF-cspG gene cluster of said group B streptococcal bacterium. In particular, 
the nucleotide sequence may be analysed for one or more positions 
corresponding to positions 1413, 1495, 1500, 1501, 1512, 1518, 1527, 1595, 
1611, 1620, 1627, 1629, 1655, 1832, 1856, 1866, 1871, 1892, 1971, 2026, 2088, 
2134, 2187 and 2196 as shown in Figure 1. 

In one embodiment, at least one region is within the cpsl/M genes of said 
group B streptococcal bacterium. 

We have also shown that a number of surface protein antigen genes, 
including rib, alp2 or alp3 genes, and five mobile genetic elements may be used 
to molecular subtype GBS. Accordingly, the present invention also provides a 
method of typing a group B streptococcal bacterium which method comprises 
determining the presence or absence in the genome of said bacterium of one or 
more surface protein antigen genes selected from a rib, alp2 or alp3 gene, and/or 
one or more mobile genetic elements selected from IS867, \S1548, \S1381, 
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ISSa4 and GBSil Preferably, such as method is combined with the above 

methods of the invention. 

The nucleotide sequence analysis step may comprise sequencing said 

one or more regions. Alternatively, or in addition, the nucleotide sequence 
5 analysis step may comprises determining whether a polynucleotide obtained from 

said bacterium selectively hybridises to a polynucleotide probe comprising one or 

more of the said regions, preferably to one or more of a plurality of polynucleotide 

probes corresponding to one or more of the said regions. 

In a preferred embodiment, where hybridisation to a plurality of probes is 
10 used as a means of analysis, the plurality of polynucleotide probes are present as 

a microarray. 

In another embodiment, the nucleotide sequence analysis step compnses 

an amplification step using one or more primers, at least one of which hybridise 

specifically to a sequence which differs between types. Typically, primer pa.rs 
15 are used, at least one of which hybridise specifically to a sequence which differs 

between types. Preferably, said primers are selected from the primers shown in 

Table 2 and/or Table 6 and/or Table 1 0. 

In a second aspect, the present invention provides a polynucleotide 

consisting essentially of at least 10 contiguous nucleotides corresponding to a 
20 region within a cpsD-cpsE-cpsF-cpsG gene of a group B streptococcal bacterium, 

said polynucleotide comprising one or more nucleotides which dtffer between 

GBS types. 

Preferably the nucleotides which dtffer between GBS types correspond to 
one or more of positions 62, 78-86, 138, 139, 144, 198, 204, 211, 281, 240 249, 
300 321, 419, 429, 437, 457, 466, 486, 602, 606, 627, 636, 645, 803, 971, 1026, 
10 44 1173 1194, 1251, 1278, 1413, 1495, 1500, 1501, 1512, 1518, 1527, 1595, 
161l', 1620, 1627, 1629, 1655, 1832, 1856, 1866, 1871, 1892, 1971, 2026, 2088, 
21 34, 21 87 and 21 96 as shown in Figure 1 . 

The present invention also provides a polynucleotide consisting essentially 
of at least 10 contiguous nucleotides corresponding to a region within a sequence 
25 delineated by the 3' 136 base pairs of cpsE and the 5' 218 base pairs of cpsG of 
the cpsE-cpsF-cspG gene cluster of a group B streptococcal bacterium, said 
polynucleotide comprising one or more nucleotides which differ between GBS 

tyP8S ' Preferably the nucleotides which differ between group B streptococcal 
30 types correspond to one or more of positions 1413, 1495, 1500, 1501 1512, 
1518, 1527, 1595. 1611. 1620, 1627, 1629. 1655, 1832, 1856, 1866, 1871, 1892, 
197l", 2026, 2088, 2134, 2187 and 2196 as shown in Figure 1. 
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The present invention also provides a polynucleotide consisting essentially 
of at least 10 contiguous nucleotides corresponding to a region within a cpsl/M 
gene of a group B streptococcal bacterium, said polynucleotide comprising one or 
more nucleotides which differ between group B streptococcal types. 
5 Preferably the polynucleotide is selected from the nucleotide sequences 

shown in Table 2. 

The present invention further provides a polynucleotide consisting 
essentially of at least 10 contiguous nucleotides corresponding to a region within 
a rib, a/p2 oralp3 gene of a group B streptococcal bacterium, said polynucleotide 
10 comprising one or more nucleotides which differ between GBS protein antigen 
gene subtypes. 

Preferably the polynucleotide is selected from the nucleotide sequences 
shown in Table 6. 

The present invention further provides a polynucleotide consisting 
15 essentially of at least 10 contiguous nucleotides corresponding to a region within 
\S861, \S1548, \S1381, ISSa4 and/or GBSil of a group B streptococcal 
bacterium, said polynucleotide comprising one or more nucleotides which differ 
between GBS mobile genetic element subtypes. 

Preferably the polynucleotide is selected from the nucleotide sequences 
20 shown in Table 10. 

The polynucleotides of the invention may be used in a method of typing, 
such as serotyping and/or subtyping, a group B streptococcal bacterium. 

In a third aspect the present invention provides a composition comprising a 
plurality of polynucleotides of the second aspect of the invention. The 
25 composition may be used in a method of typing, such as serotyping and/or 
subtyping, a group B streptococcal bacterium. 

In a fourth aspect the present invention provides a microarray comprising a 
plurality of polynucleotides according to the second aspect of the invention. The 
microarray may be used in a method of typing, such as serotyping and/or 
30 subtyping, a group B streptococcal bacterium. 

Detailed description of the invention 

Unless defined otherwise, all technical and scientific terms used herein 
have the same meaning as commonly understood by one of ordinary skill in the 
35 art (e.g., in cell culture, molecular genetics, nucleic acid chemistry, hybridization 
- techniques and biochemistry). Standard techniques are used for molecular, 
genetic and biochemical methods (see generally, Sambrook et a/., Molecular 
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Cloning: A Laboratory Manual, 3 rd ed. (2001) Cold Spring Harbor Laboratory 
Press, Cold Spring Harbor, N.Y. and Ausubel et ai, Short Protocols in Molecular 
Biology (1999) 4 th Ed, John Wiley & Sons, Inc. - and the full version entitled 
Current Protocols in Molecular Biology, which are incorporated herein by 

5 reference) and chemical methods. 

The molecular typing methods of the present invention rely on detecting 
the presence in sample of specific polynucleotide sequences in regions of the 
genome of group B streptococci (GBS) that we have identified as varying 
between different types. 

10 More specifically, the specific polynucleotide sequences that are to be 

detected lie within cpsD, cpsE, cpsF, cpsG, cpsl, cpsM, rib, alp2 and/or alp3 
genes of GBS as well as mobile genetic elements \S861, \S1548 and \S1381, 
ISSa4 and GBSil, preferably the cpsD, cpsE, cpsF, cpsG and/or cpsl/M genes. 
Regions of interest within those genes mentioned are regions whose 

15 sequence varies between two or more types, i.e. are heterogenous. 
Heterogeneity may be due to insertions; deletions and/or substitutions between 
corresponding regions in different types. In the case of rib, alp2 and alp3, 
heterogeneity typically takes the form of the presence or absence of the entire 
gene. Similarly for elements \S861, IS 1548, \S1381, ISSa4 and GBSil 

20 heterogeneity typically takes the form of the presence or absence of the entire 
sequence. 

Specific regions of heterogeneity include the following positions within 
cpsD gene- 62 and 78-86; cpsD-cpsE gene spacer - 138, 139 and 144; cpsE 
gene - 198, 204, 211, 281, 240, 249, 300, 321, 419, 429, 437, 457, 466, 486, 
25 602, 606, 627, 636, 645, 803, 971, 1026, 1044, 1173, 1194, 1251, 1278, 1413, 
1495, 1500, 1501, 1512, 1518 and 1527; cpsF gene - 1595, 1611, 1620, 1627, 
1629, 1655, 1832, 1856, 1866, 1871, 1892 and 1971; and cpsG gene - 2026, 
2088, 2134, 2187 and 2196 (numbering corresponds to numbering shown in 
Figure 1). 

30 Particularly preferred positions of interest are those that lie within a 790 bp 

fragment of cpsE-cps-F-cpsG (which consists of approximately the 3' 136 bases 
of cpsE to the 5' 218 bases of cpsG), namely positions 1413, 1495, 1500, 1501, 
1512, 1518, 1527, 1595, 1611, 1620, 1627, 1629, 1655, 1832, 1856, 1866, 1871, 
1892, 1971, 2026, 2088, 2134, 2187 and 2196 as shown in Figure 1. 

35 Another region of heterogeneity is position 62 of cpsD and a repetitive 

sequence (TTACGGCGA) found at positions 78 to 86 of cpsD in some but not all 
GBS serotypes. 
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Specific regions of heterogeneity also include a number of positions within 
the cpsl/M gene as shown in the sequence alignment depicted in Figure 3. 

These regions of heterogeneity may be analysed using a variety of means 
including sequencing, PCR and binding of labelled probes. 
5 In the case of sequencing to identify serotype, the sequencing primers are 

selected such that they hybridise specifically to a region within or near to a region 
within which a region of heterogeneity is present. The primers need not be 
specific to particular serotypes since the actual sequence information obtained 
during the sequencing process which is used to assign molecular serotype. Thus 

10 the primers may hybridise specifically to all GBS serotypes (at least serotypes la 
to VII), or to specific serotypes. 

Preferred primers anneal within 100, 50 or 20 contigous nucleotides of a 
heterogeneous position within the 790 bp region of cpsE-cpsF-cpsG shown in 
Figure 1. Examples of suitable sequencing primers are shown in Table 2 

15 (cpsES3, cpsFA, cpsFS, cpsGA and cpsGM). 

PCR and other specific hybridisation- based serotyping methods will 
typically involve the use of nucleotide primers/probes which bind specifically to a 
region of the genome of a GBS serotype which includes a nucleotide which varies 
between two or more serotypes. Thus the primers/probes may comprise a 

20 sequence which is complementary to one of such regions. Where positions of 
heterogeneity are close together (e.g. positions 198, 204, 21 1 and 218 of cp$E), it 
may be desirable to use a primer/probe which hybridises specifically to a region 
of the GBS genome that comprises two or more positions of heterogeneity. Thus 
for example, a primer/probe may be designed that is complementary to 

25 nucleotides 195 to 220 of cpsE. Such primers/probes are likely to have improved 
specificity and reduce the likelihood of false positives. 

PCR-based methods of detection may rely upon the use of primer pairs, at 
least one of which binds specifically to a region of interest in one or more, but not 
all, serotypes. Unless both primers bind, no PCR product will be obtained. 

30 Consequently, the presence or absence of a specific PCR product may be used 
to determine the presence of a sequence indicative of specific GBS serotypes. 
However, as mentioned, only one primer need correspond to a region of 
heterogeneity in the genes of interest (such as the cpsD, cpsE, cpsF, cpsG, cpsl 
and/or cpsM genes). The other primer may bind to a conserved or heterogenous 

35 region within said gene or even a region within another part of the GBS genome, 
such as the cpsH gene, whether said region is conserved or heterogeneous 
between serotypes. Thus, for example, a combination of a primer (cpsGS) which 
binds to a region of the cpsG gene including positions 2172 to 2210, and a primer 
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15 



which binds to a region of cpsH gene which is heterogeneous (lacpsHAI, 
HIcpsHA), may be used as the basis of distinguishing serotypes (la and 111). 

Further, a primer which binds to a region of cpsl which is heterogeneous 
may be combined with a primer which binds to a region of cpsG which is 
constant. An example of such as primer pair is primer pair VIcpsIA, and cpsGSI , 
which give rise to a PCR product of 1517 bp and GBS serotype VI specific. 

Alternatively, primers that bind to conserved regions of the GBS genome 
but which flank a region whose length varies between serotypes may be used. In 
this case, a PCR product will always be obtained when GBS bacteria are present 
but the size of the PCR product varies between serotypes. 

Furthermore, a combination of specific binding of one or both primers and 
variations in the length of PCR primer may be used as a means of identifying 
particular molecular serotypes. 

Examples of specific primers/probes which target the cpsD, cpsE, cpsF, 
cpsG, cpsl or cpsM genes include the following: 



cpsDS 
cpsES 
cpsEAl 

20 cpsES 1 
cpsEA2 
cpsES2 
cpsEA3 
cpsES3 

25 cpsEFA 
cpsFS 
cpsFA 

cpsGA 
30 cpsGAI 
cpsGS 
cpsGSI 
IbcpsIA 
lbcpslS 



35 



IbcpslAI 

IVcpsMA 

VcpsMA 



GCA AAA GAA CAG ATG GAA CAA AGT GG 
CTT TTG GAG TCG TGG CTA TCT TG 
GA/T/GA AAA AAG GAA AGT CGT GTC G/ATT G 
CTT GGA C/TTC CTC TGA AAA GGA TTG 
AAA A/CGC TTG ATC AAC AGT TAA GCA GG 
GAT GGT/C GGA CCG GCT ATC TTT TCT C 
CTT AAT TTG TTC TGC ATC TAC TCG C 

GTT AGA TGT TCA ATA TAT CAA TGA ATG GTC TAT TTG GTC AG 

CCT TTC AAA CCT TAC CTT TAC TTA GC 

CAT CTG GTG CCG CTG TAG CAG TAC CAT T 

GTC GAA AAC CTC TAT A/GT A AAC/T GGT CTT ACA A/GCC AAA 

TAA CTT ACC 

AAG/C AGT TCA TAT CAT CAT ATG AGA G 

CCG CCA/G TGT GTG ATA ACA ATC TCA GCT TC 

ATG ATG ATA TGA ACT CTT ACA TGA AAG AAG CTG AGA TTG 

GAA CTC TTA CAT GAA AGA AGC TGA GAT TGT TAT CAC AC 

CTA TCA ATG AAT GAG TCT GTT GTA GGA CGG ATT GCA CG 

GAT AAT AGT GGA GAA ATT TGT GAT AAT TTA TCT CAA AAA 

GACG 

CCT GAT TCA TTG CAG AAG TCT TTA CGA TGC GAT AGG TG 
GGG TCA ATT GTA TCG TCG CTG TCA ACA AAA CCA ATC AAA TC 
CCC CCC ATA AGT ATA AAT AAT ATC CAA TCT TGC ATA GTC AG 
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VIcpsIA GAA GCA AAG ATT CTA CAC AGT TCT CAA TCA CTA ACT CCG 
cpsIA GTA TAA CTT CTA TCA ATG GAT GAG TCT GTT GTA GTA CGG 

The primer designations correspond to those given in Table 2. 

In relation to the a/p2, a/p3 and rib surface protein antigen genes, 
heterogeneity and protein antigen gene subtype is assessed more at the level of 
whether a group B streptococcal bacterium contains the gene or not. Our results 
show that the specific combination of surface proteins genes present in a GBS 
genome is indicative of serotype/serosubtypes (see Table 9). Consequently, 
primers/probes suitable for use in the methods of the present invention are those 
that are specific for the particular genes. Thus probes/primers that are specific 
for a/p2 or a/p3 or rib are preferred. Figure 4 shows an alignment of a/p2 and 
a/p3 that was used to design primers specific for a/p2 or specific for a/p3. 

Examples of specific primers/probes which target the a/p2, a/p3 and rib 
genes include the following: 

bcaS1 GGT AAT CTT AAT ATT TTT GAA GAG TCA ATA GTT GCT GCA TCT 
AC 

bcaS2 CCAGGGA GTG CAG CGA CCT TAA ATA CAA GCA TC 

balS GAT CCT CAA AAC CTC ATT GTA TTA AAT CCA TCA AGC TAT TC 

balA CCA GTT AAG ACT TCA TCA CGA CTC CCA TCA C 

bal23S1 CAG ACT GTT AAA GTG GAT GAA GAT ATT ACC TTT ACG G 

bal23S2 CTT AAA GCT AAG TAT GAA AAT GAT ATC ATT GGA GCT CGT G 

bal2S CTT CCG CCA GAT AAA ATT AAG 

bal2A CTG TTG ACT TAT CTG GAT AGG TC 

bal2A1 CGT GTT GTT CAA CAG TCC TAT GCT TAG CCT CTG GTG 

bal2A2 GGT ATC TGG TTT ATG ACC ATT TTT CCA GTT ATA CG 

bal3S GTT CTT CCG CTT AAG GAT AG 

bal3A GAC CGT TTG GTC CTT ACC TTT TGG TTC GTT GCT ATC C 
ribS2 GAAGTAATTTCAG GAA GTG CTG TTA CGT TAA ACA CAA ATA TG 
ribA1 GAA GGT TGT GTG AAA TAA TTG CCG CCT TGC CTA ATG 
ribA2 AAT ACT AGC TGC ACC AAC AGT AGT CAA TTC AGA AGG 
The primer designations correspond to those given in Table 6. 

In relation to the \S861, \S1548, \S1381, ISSa4 and GBSh, heterogeneity 
and subtype is assessed more at the level of whether a group B streptococcal 
bacterium contains the element or not. The number of elements may also be 
assessed. Our results show that the specific combination of mobile elements 
present in a GBS genome is indicative of serotype/serosubtype (see Table 12). 
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Consequently, primers/probes suitable for use in the methods of the present 
invention are those that are specific for the particular mobile genetic elements. 
Thus probes/primers that are specific for \S861, \S1548, \S1381, ISSa4 and 
GBSil are preferred. 

5 Examples of specific primers/probes which target \S861, \S1548, \S1381, 

ISSa4 and GBSil include the following: 



10 



15 



20 



25 



IS861 S GAG AAA ACA AGA GGG AGA CCG AGT AAA ATG GGA CG 

IS861 A1 CAC GAT TTC GCA GTT CTA AAT AAA TCC GAC GAT AGC C 

IS861 A2 CAA ACT CCG TCA CAT CGG TAT AGC ACT TCT CAT AGG 

IS1 548S CTA TTG ATG ATT GCG CAG TTG AAT TGG ATA GTC GTC 

IS1 548S1 GTT TGG GAC AGG TAG CGG TTG AGG AGA AAA GTA ATG 

IS1 548A1 CAT TAC TTT TCT CCT CAA CCG CTA CCT GTC CCA AAC 

IS1 548A2 CCC AAT ACC ACG TAA CTT ATG CCA TTT G 

IS1 548A3 CGT GTT ACG AGT CAT CCC AAT ACC ACG TAA CTT ATG CC 

151 381 51 CTT ATG AAC AAA TTG CGG CTG ATT TTG GCA TTC ACG 

151 381 52 GGC TCA GGC GAT TGT CAC AAG CCA AGG GAG 
IS1 381 A CTA AAA TCC TAG TTC ACG GTT GAT CAT TCC AGC 
ISSa4S CGT ATC TGT CAC TTA TTT CCC TGC GGG TGT CTC C 
ISSa4A1 GCC GAT GTC ACA ACA TAG TTC AGG ATA TAG CCA G 
ISSa4A2 CGT AAA GGA GTC CAA AGA TGA TAG CCT TTT TGA ACC 
GBSil S1 CAT CTC GGA ACA ATA TGC TCG AAG CTT ACA AGC AAG TG 
GBSil S2 GGG GTC ACT ATC GAG CAG ATG GAT GAC TAT CTT CAC 
GBSi1A1 AAT GGC TGT TTC GCA GGA GCG ATT GGG TCT GAA CC 
GBSil A2 CCA GGG ACA TCA ATC TGT CTT GCG GAA CAG TAT CG 



Preferably, the primers/probes comprise at least 10, 15 or 20 nucleotides. 
Typically, primers/probes consist of fewer than 100, 50 or 30 nucleotides. 
Primers/probes are generally polynucleotides comprising deoxynucleotides. They 

30 may also be polynucleotides which include within them synthetic or modified 
nucleotides. A number of different types of modification to oligonucleotides are 
known in the art. These include methylphosphonate and phosphorothioate 
backbones, addition of acridine or polylysine chains at the 3' and/or 5' ends of the 
molecule. For the purposes of the present invention, it is to be understood that 

35 the polynucleotides described herein may be modified by any method available in 
the art. Primers/probes may be labelled with any suitable detectable label such 
as radioactive atoms, fluorescent molecules or biotin. 
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In one embodiment, primers/probes have a high melting temperature of 
>70°C so that they may be used in rapid cycle PCR. 

Compositions comprising a plurality of nucleotides that are used to analyse 
one or more regions within the cpsD, cpsE, cpsF, cpsG, or cpsl/M genes may 
5 also further comprise nucleotides that may be used to analyse one or more 
regions within the cpsH gene. Suitable nucleotides are described in the 
Examples, although a person skilled in the art could design other suitable 
sequences based on the sequence alignment shown in Figure 3. 

Further, compositions comprising a plurality of nucleotides that are used to 
10 analyse one or more regions within a/p2, alp3 or rib genes may also further 
comprise nucleotides that may be used to analyse one or more regions within the 
C alpha (bca) and C beta (bac) genes (C beta gene also known as bag). 

A variety of techniques may be used to analyse one or more regions within 
the genome of a bacterium of interest. Typically, a sample of interest, which is 
15 suspected of containing GBS bacteria is treated, using standard techniques to 
obtain genomic DNA from any microorganisms present in the sample. It may be 
desirable for a number of subsequent detection steps to use nucleic acid 
preparation techniques that result in substantial fragmentation of the genomic 
DNA. The sample may be from a bacterial culture or a clinical sample from a 
20 patient, typically a human patient. Clinical samples may be- cultured to produce a 
bacterial culture. However, it is also possible to test clinical samples directly with 
a culturing step. 

The genomic DNA is then subjected to one or more analysis steps which 
may include sequencing, enzymatic amplification and/or hybridisation. These 
25 general techniques of DNA analysis are known in the art and are discussed in 
detail in, for example, Sambrook et ai. 2001 and Ausubel et al. 1999 supra. 

Serotyping may involve a one or more steps. For example, it may be 
desirable to carry out an initial step of determining whether there are nucleotide 
sequences present in the sample which are conserved between GBS seroptypes 
30 but not found in any other organism. This may be achieved by using PCR 
primers that detect any (but only) GBS bacteria (e.g. using primer pairs 
Sag59/Sag190 and/or DSF2/DSR1 - see Tables 2 and 3). 

Molecular serotyping for specific GBS serotypes can then be performed by 
detecting the presence of one or more regions of heterogeneity in the regions of 
35 interest using any suitable technique such as sequencing, enzymatic 
amplification and/or hybridisation based on the probes/primers discussed above. 

A particularly preferred detection technique is PCR, such as rapid cycle 
PCR (Kong et al., 2000). 
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An example of a multi-step serotyping strategy (algorithm) is shown in 
Figure 2. However, a variety of other strategies are envisaged and can be 
designed by the skilled person using the sequence heterogeneity information 
presented herein. In particular, it is preferred that the serotyping procedure 
5 comprise at least one analysis step based on analysing one or regions of the 
cpsD, cpsE, cpsF, cpsG and/or cpsl/M genes. This analysis may optionally be 
combined with an analysis of one or more regions within the cpsH gene. Similar 
techniques may be used to analyse the cpsH gene regions and suitable primer 
sequences and methods are also described in the Examples. 
10 Analysis of the presence of absence of the a/p2, alp3 and/or rib genes may 

optionally be combined with an analysis of the presence or absence of C alpha 
{bca gene), C beta (bac) gene sequences as is described in the Examples. 
Similar techniques may be used to analyse these regions and suitable primer 
sequences and PCR methods are also described in the Examples. 
15 Furthermore, analysis of the presence of absence of the a/p2, a/p3 and/or 

rib genes (and optionally the bca and bac genes) may be combined with an 
analysis of the presence or absence of mobile genetic elements. 

Thus a typing strategy may involve an analysis of cps genes, surface 
protein genes and/or mobile genetic elements in various combinations to provide 
20 more serosubtyping and subtyping information. 

Analysis of GBS genomic sequences using the above techniques may 
take place in solution followed by standard resolution using methods such as gel 
electrophoresis. However in a preferred aspect of the invention, the 
primers/probes are immobilised onto a solid substrate to form arrays. 
25 The polynucleotide probes are typically immobilised onto or in discrete 

regions of a solid substrate. The substrate may be porous to allow immobilisation 
within the substrate or substantially non-porous, in which case the probes are 
typically immobilised on the surface of the substrate. Examples of suitable solid 
substrates include flat glass (such as borosilicate glass), silicon wafers, mica, 
30 ceramics and organic polymers such as plastics, including polystyrene and 
polymethacrylate. It may also be possible to use semi-permeable membranes 
such as nitrocellulose or nylon membranes, which are widely available. The semi- 
permeable membranes may be mounted on a more robust solid surface such as 
glass. The surfaces may optionally be coated with a layer of metal, such as gold, 
35 platinum or other transition metal. 

Preferably, the solid substrate is generally a material having a rigid or 
semi-rigid surface. In preferred embodiments, at least one surface of the 
substrate will be substantially flat, although in some embodiments it may be 
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desirable to physically separate synthesis regions for different polymers with, for 
example, raised regions or etched trenches. It is also preferred that the solid 
substrate is suitable for the high density application of DNA sequences in discrete 
areas of typically from 50 to 100 urn, giving a density of 10000 to 40000 cm" 2 . 

5 The solid substrate is conveniently divided up into sections. This may be 

achieved by techniques such as photoetching, or by the application of 
hydrophobic inks, for example teflon-based inks (Cel-line, USA). Discrete 
positions, in which each different probes are located may have any convenient 
shape, e.g., circular, rectangular, elliptical, wedge-shaped, etc. 

10 Attachment of the library sequences to the substrate may be by covalent 

or non-covalent means. The library sequences may be attached to the substrate 
via a layer of molecules to which the library sequences bind. For example, the 
probes may be labelled with biotin and the substrate coated with avidin and/or 
streptavidin. A convenient feature of using biotinylated probes is that the 

15 efficiency of coupling to the solid substrate can be determined easily. Since the 
polynucleotide probes may bind only poorly to some solid substrates, it is often 
necessary to provide a chemical interface between the solid substrate (such as in 
the case of glass) and the probes. Thus, the surface of the substrate may be 
prepared by, for example, coating with a chemical that increases or decreases 

20 the hydrophobicity or coating with a chemical that allows covalent linkage of the 
polynucleotide probes. Some chemical coatings may both alter the hydrophobicity 
and allow covalent linkage. Hydrophobicity on a solid substrate may readily be 
increased by silane treatment or other treatments known in the art. Examples of 
suitable chemical coatings include polylysine and poly(ethyleneimine). Further 

25 details of methods for the attachment of are provided in US Patent No. 6,248,521 . 
Methods for immobilizing nucleic acids by introduction of various functional 
groups to the molecules are also described in Bischoff et a/., 1987 (Anal. 
Biochem., 164:336-3440 and Kremsky et a/., 1987 (Nucl. Acids Res. 15:2891- 
2910). 

30 Techniques for producing immobilised arrays of nucleic acid molecules have 

been described in the art. A useful review is provided in Schena et a/., 1998, 
TibTech 16: 301-306, which also gives references for the techniques described 
therein. 

Microarray-manufacturing technologies fall into two main categories— 
35 synthesis and delivery. In the synthesis approaches, microarrays are prepared in 
a stepwise fashion by the in situ synthesis of nucleic acids from biochemical 
building blocks. With each round of synthesis, nucleotides are added to growing 
chains until the desired length is achieved. A number of prior art methods describe 
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how to synthesise single-stranded nucleic acid molecule libraries in situ, using for 
example masking techniques (photolithography) to build up various permutations of 
sequences at the various discrete positions on the solid substrate. U.S. Patent No. 
5,837,832 describes an improved method for producing DNA arrays immobilised to 

5 silicon substrates based on very large scale integration technology. In particular, 
U.S. Patent No. 5,837,832 describes a strategy called "tiling" to synthesize specific 
sets of probes at spatially-defined locations on a substrate which may be used to 
produced the immobilised DNA libraries of the present invention. U.S. Patent No. 
5,837,832 also provides references for earlier techniques that may also be used. 

10 The delivery technologies, by contrast, use the exogenous deposition of 

preprepared biochemical substances for chip fabrication. For example, DNA may 
also be printed directly onto the substrate using for example robotic devices 
equipped with either pins (mechanical microspotting) or piezo electric devices (ink 
jetting). In mechanical microspotting, a biochemical sample is loaded into a 

15 spotting pin by capillary action, and a small volume is transferred to a solid 
surface by physical contact between the pin and the solid substrate. After the first 
spotting cycle, the pin is washed and a second sample is loaded and deposited to 
an adjacent address. Robotic control systems and multiplexed printheads allow 
automated microarray fabrication. Ink jetting involves loading a biochemical 

20 sample, such as a polynucleotide into a miniature nozzle equipped with a 
piezoelectric fitting and an electrical current is used to expel a precise amount of 
liquid from the jet onto the substrate. After the first jetting step, the jet is washed 
and a second sample is loaded and deposited to an adjacent address. A 
repeated series of cycles with multiple jets enables rapid microarray production. 

25 In one embodiment, the microarray is a high density array, comprising 

greater than about 50, preferably greater than about 100 or 200 different nucleic 
acid probes. Such high density probes comprise a probe density of greater than 
about 50, preferably greater than about 500, more preferably greater than about 
1,000, most preferably greater than about 2,000 different nucleic acid probes per 

30 cm 2 . The array may further comprise mismatch control probes and/or reference 
probes (such as positive controls). 

Microarrays of the invention will typically comprise a plurality of 
primers/probes as described above. The primers/probes may be grouped on the 
array in any order. However, it may be desirable to group primers/probes 

35 according to types (capsular polysaccharide gene serotypes, serosubtypes; 
protein antigen gene subtypes; mobile genelic elements subtypes), or groups of 
types (capsular polysaccharide gene serotypes, serosubtypes; protein antigen 
gene subtypes; mobile genelic elements subtypes) for which they are specific. 
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Such grouping may be arranged such that the resulting patterns are easily 
susceptible to pattern recognition by computer software. 

Elements in an array may contain only one type of probe/primer or a 
number of different probes/primers. 

5 Detection of binding of GBS genomic DNA to immobilised probes/primers 

may be performed using a number of techniques. For example, the immobilised 
probes which are specific to a number of types (capsular polysaccharide gene 
serotypes, serosubtypes; protein antigen gene subtypes; mobile genelic elements 
subtypes), may function as capture probes. Following binding of the genomic 

10 DNA to the array, the array is washed and incubated with one or more labelled 
detection probes which hybridise specifically to regions of the GBS genome 
which are conserved. The binding of these detection probes may then be 
determined by detecting the presence of the label. For example, the label may 
be a fluorescent label and the array may be placed in an X-Y reader under a 

15 charge-coupled device (CCD) camera. 

Other techniques include labelling the genomic DNA prior to contact with 
the array (using nick-translation and labelled dNTPs for example). Binding of the 
genomic DNA can then be detected directly. 

It is also possible to employ a single PCR amplification step using labelled 

20 dNTPs. In this embodiment, the genomic DNA fragment binds to a first primer 
present in the array. The addition of polymerase, dNTPs, including some labelled 
dNTPs and a second primer results in synthesis of a PCR product incorporating 
labelled nucleotides. The labelled PCR fragment captured on the plate may then 
be detected. 

25 A number of available detection techniques do not require labels but 

instead rely on changes in mass upon ligand binding (e.g. surface plasmon 
resonance- SPR). The principles of SPR and the types of solid substrates 
required for use in SPR (e.g. BIACore chips) are described in Ausubel et a/., 
1999, supra. 

30 

C. Uses 

As discussed above, group B streptococcus (GBS) - Streptococcus 
agalactiae - is the commonest cause of neonatal and obstetric sepsis and an 
increasingly important cause of septicaemia in the elderly and 
35 immunocompromised patients. Thus, the detection methods, probes/primer and 
microarrays of the invention may be used in the diagnosis of GBS infections in 
pregnant women, elderly and/or immunocompromised patients. The PCR and 
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microarray techniques described herein may be of particular use in routine 
antenatal screening of pregnant women as well as in diagnosing infections in 
pregnant women given the increased accuracy and sensitivity compared to 
conventional identification and serotyping. These methods are also likely to give 

5 faster results since it will not generally be necessary to culture clinical samples to 
obtain enough material. Further, the molecular techniques can be used in most 
laboratories without the need for specialist expertise or reagents. 

The molecular typing methods of the invention may also assist in 
comprehensive strain identification that will be useful for epidemiological and 

10 other related studies that will be needed to monitor GBS isolates before and after 
introduction of GBS conjugate vaccines. 

The present invention will now be described in more detail with reference 
to the following examples, which are illustrative only and non-limiting. The 
15 examples refer to Figures: 

Detailed description of the Figures. 

Figure 1. Molecular serotype identification based on the sequence heterogeneity 
20 of the 3'-end of cpsD-cpsE-cpsF-and the 5'-end of cpsG (relevant primers are 
shown). 

Figure 2. Algorithm for GBS molecular serotype (MS) identification by PCR and 
sequencing. 

25 

Figure 3. Multiple sequence alignments of the gene sequences of cpsG-cpsH- 
cpsl/M for serotypes la, lb, II, III, IV, V and VI (start and stop codons are 
highlighted in bold). 

30 Figure 4. Two sites (*) of sequence heterogeneity between a/p2 (AF208158, 
upper lines) and alp3 (AF291065, lower lines) used to distinguish them (relevant 
primers are shown). 

Figure 5. Genetic relationship of 194 invasive Australasia GBS strains (or 56 
35 genotypes). 

Notes for column headed "Genetic Markers of GBS genotypes": 
Protein antigen gene profile codes are: 
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B A n : 5'end of bca positive; 

tt a° or "as": bca repetitive unit or bca repetitive unit-like region positive, 
with multiple or single band amplicons, respectively; 
u B n : bac positive; 
5 U R°: rib positive; 

M alp2 M : alp2 positive; 
u alp3 w : alp3 positive; 

"None": isolate contains none of the above protein genes. 
The molecular markers in bold type show the common features in each cluster. 

10 

Notes for column headed "No. of strains": 

After "+ M are the numbers of CSF isolates, the others are blood isolates. 

Notes for column headed "Genotypes": 

Each genotype was characterized by a distinct combination of the cps 
15 genes, protein gene profiles and mobile genetic elements. The predominant 
genotype in each serotype were named as the number "1" genotype of that 
serotype. 

Notes for the dendrogram: 

At about distance 16, the 56 genotypes could be separated into 8 clusters 
20. (1-8); at about distance 22.5 the 56 genotypes could be separated into 3 cluster 
groups (A, B, C). 

EXAMPLES 

25 MATERIALS AND METHODS 

GBS reference strains and clinical isolates. 

A panel of nine GBS serotypes (la to VIII) was kindly provided by Dr 
Lawrence Paoletti, Channing Laboratory, Boston USA (reference panel 1). Dr 

30 Diana Martin, Streptococcus Reference Laboratory, at ESR, Wellington, New 
Zealand, provided another panel of nine international reference GBS type-strains 
including serotypes la to VI (reference panel 2) (Table 1). In addition, we tested 
isolates from 205 clinical cases including 146 which had been referred from 
various laboratories in New Zealand for serotyping and 59 isolated from normally 

35 sterile sites over a period of 10 years in one diagnostic laboratory in Sydney. One 
culture was subsequently shown to be mixed, so 206 different isolates were 
examined. Conventional serotyping (CS) was performed at the Streptococcus 
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Reference Laboratory, at ESR, Wellington, New Zealand, and MS at the Centre 
for Infectious Diseases and Microbiology Laboratory Services, ICPMR, Sydney, 
Australia. 

The two panels of GBS reference strains and 63 selected clinical isolates 
5 were studied in more detail, by sequencing >2200 base pairs (bp) of each to 
identify appropriate sequences for use in MS. These and the remaining clinical 
isolates were then used to evaluate the MS method and compare results with 
those of CS. Typing by both methods was done initially without knowledge of 
results of the other. 

10 Bacterial isolates were retrieved from storage by subculture on blood agar 

plates (Columbia II agar base supplemented with 5% horse blood) and incubated 
overnight at 37°C. 

Invasive GBS clinical isolates 

15 All 194 isolates used in the study of mobile genetic elements were 

recovered from the blood (177) or CSF (17) of 191 patients (107 female, 80 male, 
four sex unrecorded; three cultures each contained mixed growth of two GBS 
serotypes). 108 isolates were from specimens submitted for culture to the Centre 
for Infectious Diseases and Microbiology Laboratory Services, ICPMR, Sydney, 

20 Australia during 1996-2001 and 83 were referred to Institute of Environmental 
Science and Research (ESR), Porirua, Wellington, New Zealand for serotyping, 
from various diagnostic laboratories in New Zealand, during 1994-2000. 

Patients were classified into age-groups for analysis of genotype 
distribution as follows: neonatal, early onset (0-6 days); neonatal, late onset (7 

25 days to 3 months); infant and child (4 months-14 years); young adult (15-45 
years); middle-aged (46-60 years); elderly (>60 years). 

These isolates are mainly a subset of the isolates described above but 
with reference strains and non-invasive isolates excluded. 

30 Conventional serotyping (CS). 

CS was performed using standard methodology (Wilkinson and Moody, 
1969). Briefly, an acid-heated (56°C) extract was prepared for each isolate and 
the serotype determined by immuno-precipitation of type-specific antiserum in 
agarose. An isolate was considered positive for a particular serotype when the 
35 precipitation occurring formed a line of identity with that of the control strain. 
Antisera used were prepared at ESR in rabbits against serotypes la, lb, Ic, II, III, 
IV, V and the R protein antigen. Fourteen selected isolates, including six that 



WO 03/025216 PCT/AU02/0128] 

18 

were nontypable using antisera against serotypes l-V, six that initially gave 
discrepant results between CS and MS and two separate isolates from a mixed 
culture, were kindly tested using antisera against all serotypes by Abbie Weisner 
and Dr. Androulla Efstratiou at Central Public Health Laboratory, Colindale, 
5 London, UK 

Molecular serotype identification (MS); development of method. 

Oligonucleotide primers. 

The oligonucleotide primers used in this study, their target sites and 

10 melting temperatures are shown in Tables 2, 6 and 10. Their specificities and 
expected lengths of amplicons are shown in Tables 3, 7 and 11. The primers 
were synthesised according to our specifications by Sigma-Aldrich (Castle Hill 
NSW, Australia). Four previously published oligonucleotide primers, and a series 
of new primers designed by us were used to sequence the genes of interest, 

15 namely 1 6S/23S rRNA intergenic spacer region and partial cps gene cluster, or to 
amplify unique sequences of individual GBS serotypes. Six previously published 
oligonucleotide primers and a series of new primers designed by us were used to 
sequence parts of and/or to specifically amplify genes encoding GBS surface 
proteins. We also designed a series of primers to sequence parts of and/or to 

20 specifically amplify five known GBS mobile genetic elements. Some were 
designed with high melting temperatures (>70°C) to be used in rapid cycle PCR. 

DNA preparation and polymerase chain reaction (PCR). 

Five individual GBS colonies or a sweep of culture were sampled using a 

25 disposable loop and resuspended in 1 ml of digestion buffer (10mM Tris-HCI, pH 
8.0, 0.45% Triton X-100 and 0.45% Tween 20) in 2 ml Eppendorf tubes. The 
tubes containing GBS suspension were heated at 100°C (dry block heater or 
water bath) for 10 minutes then quenched on ice and centrifuged for 2 minutes at 
14,000 rpm to pellet the cell debris. 5 uL of each supernatant containing 

30 extracted DNA was used as template for PCR (Mawn et al., 1 993). 

PCR systems (25uL for detection only, 50 \iL for detection and 
sequencing) were used as previously described (Kong et al., 1999). The 
denaturation, annealing and elongation temperatures and times used were 96°C 
for 1 second, 55-72°C (according to the primer Tm values or as previously 

35 described) for 1 second and 74°C for 1 to 30 seconds (according to the length of 
amplicons), respectively, for 35 cycles. 
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10 |iL of PCR products were analysed by electrophoresis on 1.5 % 
agarose gels, which were stained with 0.5 \ig ethidium bromide mL" 1 . For 
detection and/or serotype identification, the presence of PCR amplicons of 
expected length, shown by ultraviolet transillumination, were accepted as 
5 positive. For sequencing, 40 \iL volumes of PCR products were further purified by 
polyethylene glycol precipitation method (Ahmet et al., 1999). 

Sequencing. 

The PCR products were sequenced using Applied Biosystems (ABI) Taq 
10 DyeDeoxy terminator cycle-sequencing kits according to standard protocols. The 
corresponding amplification primers or inner primers were used as the 
sequencing primers. 

Multiple sequence alignments and sequence comparison. 
15 Multiple sequence alignments were performed with Pileup and Pretty 

programs in Multiple Sequence Analysis program group. Sequences were 
compared using Bestfit program in Comparison program group. All programs are 
provided in WebANGIS, ANGIS (Australian National Genomic Information 
Service), 3 rd version. 

20 

Surface protein gene profile codes 

Each isolate was given a protein gene profile code according to positive 
PCR results using various primer pairs, as shown in Table 7. 

25 Nucleotide sequence accession numbers. 

The new sequence data described have been submitted to the GenBank 
Nucleotide Sequence Databases and allocated the following accession numbers: 
AF291411-AF291419 (16S/23S rRNA intergenic spacer regions for serotypes la 
to VIII reference strains from reference panel 1); AF332893-AF332917, 

30 AF363032-AF363060, AF367973, AF381030 and AF381031 (partial cps gene 
clusters for two panels of reference strains (Table ) and selected representative 
clinical isolates); AF367974 (partial bac gene sequence, with an insertion 
sequence IS 1381 from one isolate), AF362685-AF362704 (partial bac gene 
sequences for all oac-positive isolates) and AF373214 (partial rib-like gene for 

35 reference strain Prague 25/60, an R protein standard strain). 

Previously reported sequence data referred to herein have appeared in the 
GenBank Nucleotide Sequence Databases with the following accession numbers: 
AB023574 (16S rRNA gene); U39765, L31412 (16S/23S rRNA intergenic spacer 
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regions); X68427 (S. oralis 23S rRNA gene); X72754 {cfb gene); AB028896 (cps 
gene cluster for serotype la); AB050723 (partial cps gene cluster for serotype lb); 
AF163833 (cps gene cluster for serotype III); AF355776 (cps gene cluster for 
serotype IV); AF349539 (cps gene cluster for serotype V); AF337958 (cps gene 
5 cluster for serotype VI); M97256 (bca gene); X58470, X59771 {bac gene); 
U58333 {rib gene); AF208158 (a/p2 gene), AF291065-AF291072 {alp3 gene); 
AFD54785 (IS73SY); M22449 (IS86Y); Y14270 (IS 1548% AF064785 (IS 7387); 
AF165983 (i$Sa4); and AJ292930 (GBSil). 

10 Statistical analysis and dendrogram. 

SSPS version 11 software was used for statistic analysis. A dendrogram 
was formed using Average Linkage (between groups) and Hierarchical Cluster 
Analysis in SSPS version 11 software. The presence or absence of each marker - 
MS la, lb, II, IV-VI , sst IIM-4; pgp "A". B R P , V, 'as", fl alp2 M l alp3"; bac subgroups 

15 1, 1a, 2, 3 ( 3a, 3b, 3c. 4, 4b, 5a, 7, 7a, 8, 9, 9a, 10. n1, n2; and mge IS 1381, 
\$861, IS 7548, lSSa4, GBSil - were included in the analysis. The genotypes were 
each characterized by a distinct combination of the molecular serotyping (MS) or 
sst, pgp and mge. 

20 Example 1 - Study of inter- and intra-serotype/serosubtype sequence 
heterogeneity in specific regions of the GBS genome and assessment of 
suitability for molecular serotyplngfeerosubtyping. 

Polymerase chain reaction. 

25 With two exceptions, all GBS-specific primer pairs produced amplicons of 

the expected size from all reference strains and clinical isolates tested (Table 3). 
The exceptions were Sag59/Sag190 and CFBS/CFBA Both target the cfb gene, 
but failed to produce amplicons from one clinical Isolate, despite repeated 
attempts. We assumed that this isolate either lacked the cfb gene or that the 

30 gene was present in a mutant form. It has been suggested previously that PCR 
targeting the cfb gene will not identify all GBS isolates (Hassan et al. ( 2000) and 
that another primer pair based on 16S rRNA gene, DSF2/ r DSR1 (Ahmet et ah, 
1999) was not entirely specific. Therefore, in this study, we used both primer 
pairs (DSF2/DSR1 and Sag59/Sag190) to confirm all the isolates were GBS, 

35 

Sequence heterogeneity of 16S/23S rRNA intergenic spacer regions. 

The 16S/23S rRNA intergenic spacer regions were sequenced for the 
serotypes la to VIII from reference panel 1. Multiple sequence alignment showed 
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differences between serotypes at only two positions: 207 (serotype V is T or C 
rj/C], serotypes VII and VIII are C, others are T) and 272 (serotype III is T, others 
G). These regions are therefore unsuitable for MS. 

5 Sequence heterogeneity at the 3'-end of cpsD-cpsE-cpsF-and the 5'-end of 
cpsG. 

Using a series of primers targeting the 3'-end of cpsD-cpsE-cpsF-and the 
5'-end of cpsG, we amplified and sequenced 2226 or 2217 bp - depending on the 
presence or absence of a nine-base repetitive sequence - from both panels of 
10 reference strains (serotypes la to VII) and 63 selected clinical isolates. 
Representative sequences were deposited into GenBank. See Table 1 for 
GenBank accession numbers of reference panel strains. 

Repetitive sequence. 

15 At the 3'-end region of cpsD, we found a nine-base repetitive sequence (TTA 
CGG CGA) in most isolates of MS la and II, some of MS III, all of MS IV, V, and 
VII, but none of the isolates of MS lb or VI examined. (Table 4). The presence or 
absence of this repetitive sequence can be used to further subtype MS la, II and 
III (see below). 

20 

Intra-serotype heterogeneity. 

In general, intra-serotype heterogeneity was low - there were minor random 
variations in a few isolates of all serotypes except MS III, in which the intra- 
serotype heterogeneity was more complex. MS III could be divided into four 

25 sequence subtypes on the basis of heterogeneity at 22 positions - 62, 139, 144, 
204, 300, 321, 429, 437, 457, 486, 602, 636, 971, 1026, 1194, 1413, 1501, 
1512,1518, 1527, 1629, and 2134 - and the presence or absence of the repetitive 
sequence (at 78-86) (Table 4). 

Among 60 MS III isolates (58 clinical isolates and two reference strains), 

30 serosubtypes 111-1 (30 isolates) and III-2 (22 isolates) were predominant. The 
repetitive sequence was present in serosubtype 111— 1 but not 111-2; there were 
differences at seven other sites (139, 144, 204, 300, 321, 636. and 1629). 

There were five isolates belonging to serosubtype III-3, which contained 
the repetitive sequence and were identical with serosubtype II 1-1 at three variable 

35 sites (139, 144, and 300) and with serosubtype III-2 at four (204,321, 626 and 
1629). Seroubtype III-3 differed from both serosubtypes 111-1 and III-2 at seven 
sites (486, 1026, 1413, 1512, 1518, 1527, and 2134). These seven sites in 
serosubtype III-3 were identical with the corresponding sites of MS la. 
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There were three serosubtype 111-4 isolates, whose sequences were nearly 
identical with the corresponding sequence of MS II. The only exception was at 
position 437, where the nucleotide was T in serosubtype 111—4 (as in MS VII), and 
C in MS II. This difference can be used (in addition to PCR, see below) to 
5 differentiate serosubtype 111-4 from MS II. Two serosubtype 111—4 isolates 
contained the repetitive sequence, and the other did not. Because of the small 
number of serosubtype 1 1 1-4 isolates, we did not use the repetitive sequence to 
subtype them further. 

10 Inter-serotype heterogeneity. 

There were 56 sites of heterogeneity between the eight MS (Table 4). The 
most suitable sites, for use in PCR/sequencing for MS, were a group of 23 sites 
nearest to the 3'-end of the region (Table 4, Figure 1). Firstly, they were 
consistent across two panels of reference strains and most clinical isolates (the 

15 only exceptions were the small number of serosubtypes 1 11-3 and 111-4 isolates, 
see below). Secondly, they were relatively concentrated within a 790 bp region, 
which is a convenient length for sequencing in a single reaction. Thirdly, they 
contained enough heterogeneity sites to allow differentiation, with few exceptions, 
of MS la-VII. Based only on this 790 bp region, serosubtype 111-3 cannot be 

20 distinguished from MS la, nor serosubtype 111-4 from MS II. However, they can be 
identified by MS Ill-specific PCR (see below). 

Serotype VIII does not form amplicons with primer pairs targeting the 790 
bp region, but can be identified by exclusion after PCR identification of GBS. In 
this study, one MS VIII isolate was identified, for which none of the primer pairs 

25 that amplify the 2226 bp region (in addition to those that amplify the 790 bp 
region) produced amplicons. This result was confirmed by the use of serotype 
Vlll-specific antiserum. 

Mixed serotype-specificities in single isolates. 

30 Eleven isolates were identified as one MS on the basis of the MS-specific 

PCR and overall sequence (within the 2226/2217 bp segment) but their 
sequences differed at some sites from isolates of the same MS and shared site- 
specific characteristics of another. They included five serosubtype III-3 isolates 
and three serosubtype III-4 (see above). One non-serotypable reference strain 

35 (Prague 25/60), which was identified as MS II, differed from other MS II isolates 
at five sites at the 5'-end of the region, and was identical with MS III at three of 
these sites. Prague 25/60 MS Ill-specific PCR was negative. One clinical isolate 
identified as CS II, and MS II on the basis of its overall sequence, had bases at 
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nine sites at the 5'-end of the region, that were characteristic of serotype lb; MS 
lb-specific PCR was negative. Finally, one CS V reference strain (Prague 10/84) 
had the same sequencing result as the corresponding sequence in GenBank 
(AF349539), but both were different, at three sites at the 5'-end of the region, 

5 from sequences of the other MS V strains that we studied. 

All of these mixed-serotype specificities, except for those associated with 
serosubtypes III-3 and III-4, occurred at the 5'-end region of the 2226/2217 
fragment. This supported our selection of the 790 bp 3'-end as the sequencing 
target for MS. Using this target, all MS were correctly identified except for MS III 

10 belonging to serosubtypes 111-3 and 111-4, which can be identified by MS Ill- 
specific PCR (see Example 2). 

Example 2 - Molecular serotype identification (MS) based on MS-specific 
PCR targeting the 3'-end of cpsG-cpsH-cps 1/cpsM. 

15 Our sequence alignment results showed that there was significant 

sequence heterogeneity in the 3'-end of cpsG-cpsH-cps 1/cpsM (Figure 3), which 
makes it appropriate for use in the design of specific primer pairs for 
differentiation of serotypes la, lb, III, IV, V, and VI directly by PCR. To fulfil 
possible additional future requirements - for example, development of multiplex 

20 PCR and/or to allow further evaluation of the sequence typing method, we 
designed several primer pairs for each serotype (Tables 2 & 3). Using two panels 
of reference strains and the specified conditions, all primer pairs amplified DNA 
only from the corresponding serotypes. When clinical isolates were tested, similar 
results were obtained with two sets of MS-specific primer pairs. In general, more 

25 stringent conditions (lower primer concentration, higher annealing temperatures) 
could be used with primers generating smaller amplicons. Those selected for MS 
are shown in Table 3 and Figure 2. 

A MS was assigned, by PCR, to 179 of 206 (86.9%) clinical isolates as 
follows: MS la 40; MS lb 35; MS III 58 (including those previously identified as 

30 serosubtypes III-3 and III-4); MS IV 7; MS V 36; MS VI 3. 
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Example 3 - Comparison of serotype identification results between WIS 
and CS. 

After CS and MS had been completed, the results were compared. Initial 
results were discrepant for 15 isolates, all but five of which (see below) were 
5 resolved by retesting and/or correction of clerical errors. 

The CS and MS/sequence subtyping results are shown in Table 5. A MS 
was assigned to all isolates by PCR and/or sequencing, compared with 188 of 
206 (91.3%) by CS. Specific PCR has not yet been developed for MS II and VIII, 
so all MS II isolates were determined by sequencing only and one presumptive 
10 MS VIII isolate was decided by exclusion (see Example 1). For all other isolates, 
the results of PCR and sequencing were consistent, except for serosubtypes III-3 
and III-4 and other minor sequence differences described above (Example 1). CS 
results correlated well with PCR results. 

Final CS and MS results were the same for all 188 isolates (100%) for 
15 which results for both methods were available. Eighteen clinical isolates that were 
non-serotypable by CS, were assigned MS as follows: la, two; lb, five; II, one; 
serosubtype 111-1, three; serosubtype III-2, one; V, five; and VI, one. 

Sequences (2217 bp) of three clinical isolates that we identified as MS VI, 
were identical with those for serotype VI reference strains and the corresponding 
20 sequence in GenBank (AF337958). 

Mixed culture. 

Four clinical isolates gave positive results with MS Ill-specific PCR, but 
were provisionally identified as MS II by sequencing. Three were CS III and one 

25 CS II, with a weak cross-reaction with serotype III antiserum. These isolates were 
studied further by subculturing 12 individual colonies of each. All subcultures 
were tested by MS Ill-specific PCR. All 12 colony subcultures of the three CS III 
isolates were positive by MS Ill-specific PCR and the isolates were therefore 
classified as serosubtype III-4 (see above). However, 11 of 12 colony subcultures 

30 of the fourth isolate were negative by MS Ill-specific PCR; and one was positive 
by MS Ill-specific PCR. It was therefore assumed that this was a mixed culture, 
predominantly of MS/CS II. The one MS Ill-specific PCR positive colony was 
subsequently identified as serosubtype III-2 and included as an additional clinical 
isolate (total 206 in all). 
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Example 4 - Algorithm for serotype assignment of GBS by PCR and 
sequencing 

As an example of how the PCR and sequencing methods described above 
may be used clinically to perform GBS serotype identification, we designed an 
5 algorithm for clinical use. All the primers (except the inner sequencing primers) 
used were given high melting temperature (>70°C), so rapid cycle PCR could be 
used (Figure 2) (see Table 2 for primer sequences). 

Example 5 - Identification of regions in the a/p2, a/p3 and rib genes suitable 
10 for protein antigen gene specific subtyping 

Polymerase chain reactions. 

With few exceptions, all primer pairs produced amplicons of predicted 
length from isolates giving positive results (Table 7). The exceptions included one 
isolate that was positive by PCR using primer pairs GBS1360S/GBS1937A and 

15 GBS1717S/GBS1937A (which both target bac gene) but produced amplicons 
significantly longer than those of other oac gene-positive isolates. Sequencing 
showed that the amplicon contained the insertion sequence \S1381 with minor 
variations compared with the published sequences (Tamura et al., 2000). The 
amplicons produced using primers IgAagGBS/RlgAagGBS and lgAS1/lgAA1 

20 (also targeting bac gene) varied in length (Berner et al., 1999) and were 
sequenced for further subtyping (see below and Table 8). 

Amplicon sequencing results. 

To confirm the specificity of selected primer pairs that we had designed or 
25 modified, we sequenced 10 of 23 amplicons produced by bcaS1/bcaA (targeting 
the 5'-end of bca gene) and all of those produced by ribS1/ribA3 (targeting rib 
gene) and GBS1360S/GBS1937A (targeting bac gene), from the two panels of 
reference strains and 31 randomly selected clinical isolates. . 

All 10 amplicons of primers bcaS1/bcaA and 12 of 13 of primers 
30 ribS1/ribA3 were identical with the corresponding gene sequences in GenBank 
(M97256, bca gene and U58333, rib gene, respectively). One additional isolate, 
namely Prague 25/60 in reference panel 2 (which is used to raise R antiserum), 
produced an amplicon with primer pair ribS1/ribA3 only at a lower annealing 
temperature (55 °C) but not with ribS2/ribA1 and ribS2/ribA2. It was therefore 
35 assumed not to contain rib gene, although the amplicon sequence showed 
considerable homology with rib gene (71.4% or 66.6% according to whether or 
not the primer sequences were included) (Figure 3). This isolate was the only 
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one, of 224 tested, for which PCRs were negative using ribS2/ribA1 and 
ribS2/ribA2 but positive using ribS1/ribA3. The latter primer pair is assumed to be 
not entirely specific for rib gene and was therefore used only for sequencing. 

Four of 10 amplicons of primer pair GBS1360S/GBS1937A (targeting bac 

5 gene) were identical with the corresponding sequence in GenBank (X58470, 
X59771). A single point mutation (A to G, 1441 of X59771) was found in the 
remaining six bac gene amplicons, including the one which contained the 
insertion sequence IS 1381 (see above and AF367974). 

Amplicons from all of the 224 isolates that gave positive PGR results using 

10 primer pairs bcaS1/baiA (targeting alp2lalp3 genes), bat23S1/bal2A2 (targeting 
a/p2 gene) and IgAagGBS/RlgAagGBS (targeting bac gene) were sequenced. 

Fifty isolates produced amplicons using primer pair bcaS1/balA. The 
sequences of nine were identical with the corresponding portions of the published 
sequence of a/p2 gene (AF208158) and 41 with that of alp3 gene (AF291065). 

15 There are two consistent heterogeneity sites between a/p2 and alp3 genes in the 
sequences of bcaS1/balA amplicons (Figure 4), which can be used to distinguish 
them, in addition to alp2 and a/p3 gene -specific PCR. All nine amplicons of 
primer pair bal23S1/bal2A2 were identical with the corresponding portion of the 
a/p2 gene sequence in GenBank (AF208158). 

20 The primer pair IgAagGBS/RlgAagGBS identified bac gene in 52 isolates. 

There was considerable sequence variation, which allowed separation of bac 
gene -positive isolates into 11 groups and 20 subgroups based on amplicon 
length and sequence heterogeneity, respectively (Table 8). The groups contained 
small numbers (one to five) of isolates except for B1 (20 isolates, 2 subgroups) 

25 and B4 (11 isolates, 3 subgroups). The differences in amplicon length was 
generally caused by the presence or absence of short repetitive sequences. 

Further confirmation of specificity of surface protein gene-specific primer 
pairs. 

30 To confirm primer specificity, we compared the results of PCR using the 

primer sequences we had designed or modified for bac gene PCR, with those of 
PCR using previously published primers and found 100% correlation. 

The previously reported non-specificity of the published primer pair 
bcaRUS/bcaRUA (targeting the bca gene repetitive unit) was confirmed. Using 

35 these primers, all nine alp2 gene positive (bcaS1/bcaA negative) isolates and 53 
which were PCR negative using the primers bcaS1/bcaA, bcaS2/bcaA (targeting 
the 5'-end of bca gene), ba!23S1/bal2A2 and bal23S2/bal2A1 (targeting the 5"- 
end of alp2 gene) produced amplicons. Our sequencing showed that bca gene 
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and a/p2 gene have significant homology in the regions targeted by bcaRUS/ 
bcaRUA allowing amplicon formation from alp2 gene -positive strains. These 
false positive results could be due to the presence of other C alpha-like proteins, 
containing regions homologous with the oca gene repetitive unit (oca gene 

5 repetitive unit-like sequence). 

We also showed that the results of PCR using two or more primer pairs that 
we had designed for individual genes (rib, alp2, and alp3 genes) correlated well, 
supporting the specificity of each set. The only exception, as mentioned above, 
was ribS1/ribA3, which produced a non-specific amplicon from one of 224 

10 isolates tested. 

Example 6 - The relationship between surface protein antigen gene profiles 
and cps serotypes/serosubtypes. 

Surface protein gene profiles. 

15 For each gene (except bca gene repetitive unit or bca gene repetitive unit- 

like region), we selected two primer pairs to identify and characterise GBS 
surface protein by PCR. Each isolate was given a protein gene profile code 
according to PCR results as follows: 

"A": 5'end of bca gene amplified by bcaS1/bcaA and bcaS2/bcaA; 
20 "a" or "as": bca gene repetitive unit or bca gene repetitive unit-like region 

amplified by bcaRUS/bcaRUA, with multiple or single band amplicons, 
respectively; 

"B": bac gene amplified by GBS1360S/GBS1937A and 
IgAagGBS/RlgAagGBS (>20 subgroups based on sequence 
25 heterogeneity). 

"R": rib gene amplified by ribS2/ribA1 and ribS2/ribA2; 
■alp2": a/p2 gene amplified by bal23S1/bal2A2 and bal23S2/bal2A1 and 
u alp3": a/p3 gene amplified by bal23S1/bal3A and bal23S2/bal3A 
(Table 7). 

30 Four common profiles accounted for 203 of 224 (90.6%) isolates: "R" (62 

isolates), "AaB" (51 isolates), "a" (49 isolates) and "alp3" (41 isolates) (see 
Table 4). Only two isolates contained no surface protein gene markers. All but 
one isolate with the bac gene ("B") also had bca gene, with its repetitive unit 
("Aa"); one had rib gene. All "alp2" isolates contained single bca repetitive unit- 

35 like sequences ("as"). "A", "R", "alp2" and "alp3" were all mutually exclusive. 62 of 
63 isolates with rib gene ("R") and 41 of 41 isolates with alp3 gene had no other 
protein antigen markers. 
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The relationship between surface protein antigen gene profiles and cps 
serotypes/serosubtypes. 

A cps molecular serotype (MS) was assigned to all isolates in accordance 
5 with the methods described in Examples 1 to 4 and the results correlated with 
conventional serotyping (CS) results except for 19 of 224 isolates that were 
nontypable using antisera. The relationship between surface protein gene profiles 
and cps MS are summarised in Table 9. 

The following strong associations were confirmed or demonstrated 
10 between: MS la and bca gene repetitive unit or bca gene repetitive unit-like 
sequence (most with profile u a n ); MS serosubtypes 1 1 1—1 and 111-2 and rib gene; MS 
serosubtype 111-3 and a/p2 gene; MS lb and bca/bac genes and MS V and a/p3 
gene. MS II showed the most varied surface protein gene profiles. However, the 
relationships were not absolute and different combinations of cps serotypes and 
15 protein gene profiles produced 31 different serovariants or 51 when bac gene 
( u B n ) subgroups were considered. 

Example 7 - The relationship between surface protein antigens and protein 
gene profiles. 

20 Based on conventional serotyping, 33 isolates (belonging to CS la/c, Ib/c, lie, lib, 
lllc or lllb) reacted with the C antiserum. The surface protein gene profiles of all 
' these isolates contained bca gene ("A") or bca gene repetitive unit-related 
markers ("a" or "as"): Aa, 3; AaB, 18; a, 11; alp2as,1. Twenty nine isolates 
reacted with the R antiserum and, of these, 22 contained rib gene and six, alp3 

25 gene. The strain used to raise the R protein antiserum (Prague 25/60) contained 
a presumed rib-like gene (see above and Figure 3). 

Example 8 - Identification of mobile genetic elements suitable for molecular 
subtyping 

30 We developed a series of PCR primers to screen for the presence of five 

mobile elements in GBS serotypes. 

Specificity of primers pairs. 

All the primer pairs produced ampficons of the expected lengths (Table 11) 
35 from some reference and/or some clinical isolates (Table 12). To evaluate the 
specificity of our primer pairs, we sequenced all amplicons produced by primers 
IS1548S/IS1548A3 and ISSa4S/ISSa4A2, and amplicons, selected from both 
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reference and clinical isolates, produced by IS861S/IS861A2 (12 isolates), 
IS1381S1/IS1381A (24 isolates) and GBSi1S1/GBSi1A2 (11 isolates). 

All 41 IS 1548 and 15 ISSa4 amplicon sequences were identical with the 
corresponding sequences in GenBank (Y14270 and AF165983, respectively). 
5 Five of 12 \S861 amplicon sequences were identical with the corresponding 
\S861 sequence in GenBank (M22449). The other seven differed, at position 732, 
from the published sequence (G to A) and the reference strain Prague 25/60 had 
two additional differences - G to A and T to A - at positions 576 and 830 of 
M22449, respectively. 

10 Previously, we found a full-length insertion sequence \S1381 (AF367974) 

within C beta antigen gene of a clinical isolate, with several differences compared 
with the original published sequence (AF064785): the terminal inverted repeats 
contained 15, rather than 20 base pairs (bp); there was a three bp deletion and 
four individual bp differences in the putative transposase pseudogene between 

15 positions 419 to 429 (of the original GenBank sequence) - GGG ATC CGA TT 
(AF064785) vs CAG A- -GG TA (AF367974; our sequence). All amplicons of 
primer pair IS1381S1/IS1381Afrom 12 reference and 12 selected clinical isolates 
were identical with each other and with that of our \S1381 sequence in GenBank 
(AF367974) but different, as above, from the original reported \S1381 sequence 

20 (AF064785). 

The amplicons of primer pair GBSi1S1/GBSi1A2 from all four GBSH- 
ppsitive reference strains and seven selected clinical isolates were sequenced. 
Six (including those of three reference strains) were identical with the 
corresponding GBSil sequence in GenBank (AJ292930). Amplicons from four 

25 clinical isolates showed three site-variations (C to T at position 767, A to C at 
position 846 and T to C at position 923 of AJ292930 sequence). The reference 
strain Prague 25/60 showed only the first two of these site-variations. 

In addition to sequencing, we evaluated the specificity of our primer pairs 
by comparing PCR results for two or more primer pairs for each target (Table 1 1 ). 

30 In all cases, the same sets of isolates gave positive results when tested with PCR 
targeting the same mobile genetic elements, thus confirming the specificity of the 
primer pairs. 

PCR results using specific primer pairs for all five mobile genetic elements. 

35 IS867, \S1548, \S1381, \SSa4 and GBSil were identified in 55%, 18%, 85%, 

7% and 19% of isolates, respectively. None of the mobile elements was detected in 
10 (4%) isolates. The distributions of the five mobile elements identified by PCR in 
the 224 GBS isolates tested in the previous examples are shown in Table 12. )S1381 
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was detected alone in 79 isolates and GBSil alone in one. Forty-six isolates 
contained two different insertion sequences {\S861 and \S1381, 42 isolates ; IS 1548 
and IS 1381, three isolates; ISSa4 and \S1381, one isolate). Forty-four isolates 
contained three (\S861, \S1548 and \S1381 34; \S861, ISSa4 and \S1381, 10) and 
5 one contained all four insertion sequences. Forty-one isolates contained GBSM in 
combination with one {\S861, 22; \S1381, one isolate) two (IS861 and \S1381, 11; 
ISSa4 and IS 1381, three isolates) or three (\S861 , \S1548 and \S1381, four isolates) 
insertion sequences. 

10 PCR results for the 194 invasive isolates using specific primer pairs for all 
five mobile genetic elements - . 

The numbers of isolates containing different mobile genetic elements (mge) 
combinations (from none to four per isolate) are shown in Table 13. IS1381, IS861, 
IS1548, ISSa4 and GBSil were identified in 87%, 52%, 17%, 6% and 18% of 
15 isolates, respectively. Six (3%) isolates contained no mge. 

Example 9 - The relationships between cps serotypes, serosubtypes, surface 
protein gene profiles and mobile genetic elements. 

The distribution of each of the five mobile genetic elements in different cps 
20 serotypes, serotype III subtypes and surface protein gene profiles are shown in 
Tables 12 and 13. The most consistent findings for each sero/serosubtype were: 

1 ) Serotype la - most (>80%) expressed proteins that closely related with C alpha 
protein and contained IS1381 

2) Serotype lb - most (>90%) expressed C alpha and C beta proteins and 
25 contained IS861 and IS1381 

3) Serotype II - exhibited two common patterns: 

a) >50% expressed C alpha protein (and often C beta) and contained IS861, 
IS1381 and sometimes other mobile elements, especially ISSa4 or 

b) >25% expressed Rib protein and contained IS861, IS1381 and GBSil 

30 4) Serosubtype 111-1 - all expressed Rib protein and contained IS861 , IS1 548 and 
IS1381 but not GBSil. 

5) Serosubtype 111-2 - all expressed Rib protein and contained IS861 and GBSil 
but neither IS 1548 nor IS1381. 

6) Serosubtype III-3 - all expressed C alpha-like protein 2 and contained no 
35 mobile genetic elements. 

7) Serosubtype III-4 - expressed various proteins; all contained GBSil . 
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8) Serotype IV - most expressed proteins that closely related with C alpha protein 
and contained IS1381 

9) Serotype V - most expressed C alpha-like protein 3 contained IS1381 

10) GBSil and IS1548 were mutually exclusive in serotype III (MM, III-2 and III-4) 
5 but not in serotype II. 

11) All isolates that expressed C alpha-like protein 2 contained no insertion 
sequences. 

Predominant relationships between MS/sst, pgp and mge. 

10 Figure 5 shows the relationships between the various genetic markers. IS1381 

was present in nearly all isolates of MS la, lb, IV, V and VI, but in none of sst III-2 or 
III-3. IS1548 was found exclusively, and GBSil most commonly, in serotypes II or III; 
three isolates (all MS II) contained both GBSil and IS1548. IS861 was found in all sst 
111-1 and III-2 and most MS II and lb isolates but only in 14% of other MS isolates. 

15 ISSa4 was present in only 6% of isolates, more than half of which were MS II; it was 
present in one invasive isolate obtained before 1996 (1994). IS1381 was found in 
most isolates except those in cluster 8, pgp °aIp2 M , which had no insertion sequences. 
IS861was found in most genotypes with pgp "AaB" (clusters 3 and 4) and all 
genotypes with pgp "R" (clusters 6 and 7). 

20 

Genotypes based on MS/sst, pgp, bac subtypes and mge. 

MS/sst, pgp, bac subtype (for isolates with pgp U B") and the presence of 
various combinations of mge provide a PCR/sequencing-based genotyping system. 
The 194 invasive isolates in this study represented seven serotypes, ten MS/sst, 41 
25 subtypes based on the distributions of pgp and mge or 56 genotypes when bac 
subtypes (mainly in MS lb) were included (Figure 5). 

Theoretical GBS clonal population structure. 

Theoretically there are 13 possible GBS MS/sst (eight MS - la, lb, II, IV-VIII, 
30 four sst III 1-4 and cps gene cluster absent) and at least 10 pgp (none, u Aa'\ "AaB", 
"a", "as", n R", "RB", "a^as", "alp3 n or "alp4a"). If the 22 bac subgroups identified so 
far are included, there are up to 31 pgp. If the five mge were independently, randomly 
distributed and present or absent, there would be 13x31x2 5 = 12,896 different possible 
combinations of molecular markers. The fact that only 56 different combinations were 
35 found (Figure 5), demonstrates that markers are not randomly distributed or, in other 
words, these invasive Australasian GBS isolates have a clonal population structure. It 
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is possible, but unlikely, that these isolates represent a very limited number of GBS 
genotypes. 

The phylogenetic relationship of Australasian invasive GBS. 

5 The 56 genotypes formed eight clusters, separated at a genetic distance of 

about -16 (or three cluster groups separated at a distance of -22.5). The pgp 
was the main determinant of cluster separation (Figure 5). 94% of isolates 
belonged to five MS (la, lb, II, III and V), 62% belonged to five (9%) genotypes 
(la-1 , lb-1 , MM , III-2, V-1 ) and 92% belonged to the five largest clusters (1 , 2, 4, 6 

10 and 7). Cluster group A, the largest, contained 139 (72%) isolates and 48 (86%) 
genotypes, 45 of which contained fewer than five isolates, whereas cluster group 
B contained 49 (25%) isolates and five (9%) genotypes. 

The main characteristics of each cluster were as follows: 
Cluster 1. tt alp3 M f IS1381 (39 isolates, four MS, 11 genotypes; predominant 

15 genotype V-1). 

Cluster 2: "a" or "as", IS1381 (55 isolates, four MS, 12 genotypes, predominant 
genotype la-1). 

Clusters tt Aa M or "AaB", MS II, IS1381, IS 861 (10 isolates, six genotypes). 
Cluster 4: "AaB", IS1381, IS861 (35 isolates, two MS: VI or lb; 18 genotypes; 
20 predominant genotype lb-1 ). 

Clusters "AaB", IS861, GBSil, genotype 111-4-1 (one isolate). 

Cluster 6: "R", IS861 and GBSil (22 isolates, three MS/genotypes; predominant 

genotype III-2). 

Cluster 7: U R°, IS1381 and IS861 (27 isolates; two MS/genotypes; predominant 
25 genotype II 1-1). 

Cluster 8: u alp2as B , no IS (six isolates; three MS/genotypes; one contained 
GBSil). 

The phylogenetic study showed that the dendrogram inferred by SSPS 
was very robust. 

30 

The relationship between genotypes and GBS disease patterns. 

The distribution of MS and genotypes in different age groups of patients with 
invasive GBS disease is shown in Table 14. All common MS were represented in 
more than one patient group. However, there were highly significant associations 
35 (when compared with ail other age-groups) between sst Hl-2 and late onset neonatal 
infection (p=0.0005) and MS V and infection in the elderly (p=0.001). 
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There were 17 isolates from cerebrospinal fluid specimens, nine (53%) of 
which were MS III (from three different sst/genotypes, each in a different cluster). The 
other eight isolates were distributed among five MS, seven genotypes and four 
clusters. Meningitis occurred in all age-groups but comprised 23% of cases in the late 
5 onset neonatal group compared with 5% in all other groups. 

DISCUSSION 

Capsule production in GBS is controlled by capsular polysaccharide 
synthesis (cps) gene cluster, which had been sequenced for serotype la and 

10 serotype III before we began our study. Corresponding sequences for serotype lb 
(Miyake et ai, 2001 submitted into GenBank, GenBank accession number: 
AB050723), and for serotypes IV, V, and VI (McKinnon et ai, 2001 submitted into 
GenBank, GenBank accession numbers: AF355776, AF349539, AF337958, 
respectively) were released recently when the project was nearly finished but 

15 those for the other three serotypes (II, VII and VIII), the sequences of cps gene 
clusters, have not been published previously. 

The sequences of cps gene clusters for serotypes la, and III showed 
considerable homology at the 3'-end of cpsD-cpsE-cpsF-ar\d the 5'-end of cpsG. 
We designed a series of primers to amplify a 2226/2217 bp segment in this 

20 region and found that amplicons were obtained from all serotypes except VIII. 
This confirmed a previous suggestion that serotype VIII is significantly different 
from other serotypes in this region. 

Using eight serotype (la to VII) reference strains, we showed more than 50 
heterogeneity points between serotypes (Figure 1, Table 4). Using 63 selected 

25 clinical isolates that had been serotyped by conventional methods, we found that 
these inter-serotype differences were generally consistent and specific, especially 
the 23 sites clustered at the 3'-end of the regions. We used these differences to 
assign serotypes to the remaining clinical isolates collected in this study, without 
knowledge of the serotype obtained by conventional methods. 

30 Sequence analysis of the 3'-end of cpsG-cpsH-cpsl/cpsM for serotypes la, 

III, lb, IV, V and VI showed that this region is highly variable (Figure 3), making 
this region a suitable target for direct serotype identification by PCR. We 
designed several pairs of MS-specific primers for MS la, lb, III, IV, V and VI and 
used them to test two CS reference panels. Selected primer pairs were used for 

35 MS, by PCR alone, of 86.9% of our 206 clinical isolates. Using rapid-cycle MS- 
specific PCR, results are available within one working day. In future, it will be 
possible to extend this method to all MS, when cps gene cluster sequences in 
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this region are available for serotypes II, VII and VIII. Meanwhile, MS II and VII 
can be identified by sequencing the 790 bp PCR amplicons of the 3'-end of cpsE- 
cpsFAhe 5'-end of cpsG (Figure 1 , Table 4). A positive GBS-specific PCR and 
negative PCR results with all the primers that amplify the 790 bp, identified MS 
5 VIII, by exclusion. 

In future, and in some laboratories currently, sequencing of the 790 bp 
PCR amplicons of the 3'-end of cpsE-cpsF-\he 5'-end of cpsG for all isolates may 
be more convenient, as only one method and fewer primers are needed. 
However, if sequencing is not available in-house, the turn-around time is longer 
10 and a small proportion of serotypes would be wrongly assigned (serosubtypes III- 
3 and 111-4 as MS la and II, respectively). This could be avoided by screening with 
MS Ill-specific PCR first. Sequencing the 790 bp PCR amplicon, allows MS III to 
be subtyped on the basis of the sequence heterogeneity. 

Previous studies have shown that serotypes la, lb, II, III, and V are those 
15 most frequently isolated from normally sterile sites, in the United States and 
several countries. Serotypes VI and VIII are the predominant serotypes isolated 
from patients in Japan, but are uncommon elsewhere. Although our isolates were 
selected, they were probably representative of those causing disease in 
Australasia; la, lb, II, III, and V were the most common serotypes identified, 
20 although there were small numbers of serotypes IV.VI and, VIII. 

Up to 13 % of GBS isolates are non-serotypable and in our study the 
proportion was 8.7% (18/206) using the antisera available. This may be due to 
decreased type-specific-antigen synthesis; non-encapsulated phase variation; or 
insertion or mutation in genes of cps gene clusters. One non-serotypable strain 
25 GBS in our study had a T base deletion in cpsG gene, which caused a change in 
the cpsG gene reading frame. 

We have also developed PCR-based methods to identify GBS surface 
protein genes and further characterise these isolates. Using the published bac 
gene sequence, we modified bac gene-specific primers and designed new 
30 primers, with high melting temperatures (>70 °C) suitable for rapid cycle PCR 
targeting all major surface protein genes. 

As previously reported, a published PCR primer pair targeting the bca 
gene repetitive unit (at the 3'-end of bca gene), was not entirely specific for bca 
gene. We designed two new primer pairs targeting the 5'-end of bca gene, to 
35 improve the specificity. However, very few serotype la strains gave positive 
results using these primers whereas all were PCR positive using primers 
targeting the bca gene repetitive unit. These results were consistent with a 
previous report, that a probe targeting the 5'-end of bca gene hybridized with only 
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one of nine serotype la strains, but a large bca gene probe, including the tandem 
repeat region, hybridized with all nine strains. 

PCR specific for rib, alp2 and alp3 genes has not been described 
previously. The primer pairs we designed mainly targeted the 5'-ends of the gene 

5 and were chosen after comparing the gene heterogeneity with related gene 
sequences. We designed two or more primer pairs for each gene to check primer 
specificity by comparison of results of different PCR targeting the same genes. 
Protein gene profiles "alp2" and "alp3"were distinguished on the basis of the alp2 
and alp3 gene -specific PCR and/or two sequence heterogeneity sites in the 

10 amplicons of bcaS1 /balA, or bcaS2/ balA. 

To confirm the specificity of our primers, we used them to examine two 
reference panels and selected GBS isolates. The longest amplicons produced by 
PCR for each gene were sequenced, to provide maximal sequence information 
and ensure that the inner primers were not located at strain heterogeneity sites. 

15 Our sequencing results confirmed the specificity of the primers. Two pairs of 
primers for each gene were compared, with similar results. Finally, six 
gene/region specific primer pairs (including the one targeting the bca gene 
repetitive unit) were used to define protein antigen gene profiles for all 224 
isolates. 

20 The study showed that only one member of the surface protein gene family 

containing repetitive sequences - rib, bca, alp2, and alp3 genes-could be present 
in any single isolate. However, all isolates containing bac gene, which is not a 
member of the surface protein gene family containing repetitive sequences, also 
contained either oca gene (51/52) or rib gene (1/52). 

25 Bac gene was present in 23% of isolates, a similar proportion to that (19- 

22%) previously reported. In common with others, we found variations in the bac 
gene due to variable small internal repetitive sequences. These bac gene 
repetitive sequences were irregular (unlike those of the bca-rib gene family). 
Their role is not clear, but they are potentially useful molecular markers for 

30 epidemiological studies. 

Our data show that some serotype III isolates (our MS serosubtypes 111-1 
and III-2) were closely associated with rib gene, and others (our MS serosubtype 
IH-3) with a/p2 gene. Serotype lb was associated with bca and bac genes and 
serotype V with alp3 gene. However, as the relationship was not absolute, 

35 different combinations of cps serotypes-serosubtypes/protein gene profiles 
identified many serovariants, which will be useful in epidemiological studies and 
in formulation of conjugate vaccines. Based on PCR only, we were able to divide 
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our 224 isolates into 31 serovariants based on bac gene (B) groups or 51, based 
on subgroups. Theoretically, there are likely to be additional serovariants. 

We found that the antisera to "c" and "R" protein antigens were not entirely 
specific for any particular protein genes. However, reaction with "c" antiserum 

5 mostly reflected the presence of genes encoding C alpha (bca gene) and related 
protein antigens (at least including a/p2 gene) and the antiserum to "R" with those 
encoding Rib {rib gene) and related proteins (at least including a/p3 gene, and the 
rare presumed rib-like gene). 

We have also investigated the presence of a number of mobile element in 

10 different serotypes of GBS. Four different insertion sequences have been 
identified previously in GBS. Multiple copies of \S861 in some serotype III 
isolates were associated with increased capsule gene expression. We found 
\S861 in all serosubtypes I II— 1 and 111-2 and most serotype II and lb isolates but 
few others. All IS 861 -containing isolates contained at least one additional mobile 

15 element. 

Multiple copies of \S1381 have been found in a high proportion GBS and 
other Streptococcus species, including S. pneumoniae and used as probes for 
restriction fragment length polymorphism (RFLP) analysis of GBS for 
epidemiological studies (Tamura et a!., 2000). We found \S1381 in 85% of 

20 isolates overall. They were present in all isolates of serosubtype 111-1 but none of 
serosubtypes III-2 or III-3. Our IS 7387 sequences, from 24 isolates, were identical 
with each other, but differed at several sites, from that previously described 
(AF064785). The significance of these differences is unknown, but it emphasizes 
the importance of confirming sequences from as many different strains as 

25 possible. 

ISSa4 was first identified in a nonhemolytic GBS isolate, in which it caused 
insertional inactivation of the gene cylB, which is part of an ABC transporter 
involved in production of hemolysin. Only a small proportion of (mainly hemolytic) 
GBS isolates (4%) contained ISSa4, all of which had been isolated since 1996 
30 and it was postulated that ISSa4 had been newly acquired by GBS. We also 
found ISSa4 in only a small proportion of isolates (7%) but it was present in 
similar proportions of clinical isolates obtained before (4 of 44) and during or after 
(11 of 162)1996. 

\S1548 was first discovered in some hyaluronidase-negative GBS 
35 serotype III isolates, in which it caused insertional inactivation of the gene hylB 
(one of a cluster responsible for production of hyaluronidase, an important GBS 
virulence factor) (Granlund et al., 1998). A copy of \S1548 is also found 
downstream of the C5a peptidase gene (also associated with virulence), in 
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isolates that contain it. Most IS7548-containing isolates were from patients with 
endocarditis and it was postulated that inactivation of hyaluronidase production 
and/or some effect on C5a peptidase may allow GBS isolates to adhere to and 
survive on heart valves. 
5 We found \S1548 in all serosubtype 111-1 isolates, which represented 52% 

of 58 serotype III isolates in our collection, from superficial (eight of 12) and 
normally sterile (22 of 46) specimens. The latter were from neonates (seven of 
20), adults (three of six) and subjects of unspecified age (12 of 20) (data not 
shown). Although specific clinical data were unavailable, GBS endocarditis is 
10 uncommon and likely to have been present in few, if any, of these subjects. 
Further study is required to elucidate the association with this insertion sequence 
with specific virulence factors and clinical syndromes. 

We found GBSil, a group II intron, in 19% of our 224 isolates overall; it 
was commonly associated with \S861, and the distribution varied with 
15 serotype/serosubtype. It was rarely found in serotypes other than II and III. It was 
present in more than 50% of serotype II isolates, including four, which also 
contained IS 1548. It was found in all serosubtypes III-2 and III-4 isolates, in 
which IS 1548 was not found, but in no serosubtype 111-1 isolates which did 
contain IS 1548 or serosubtype III-3 isolates which did not. 
20 Our subdivision of GBS serotype III into four serosubtypes, based on 

differences within the cps gene cluster was supported by corresponding 
differences in surface protein gene profiles and distribution of the five mobile 
elements described in this study. Although we did not test our isolates for 
hyaluronidase activity, it is likely that our serosubtype MM, which expresses Rib 
25 protein and contains \S1548, \S861 and \S1381, corresponds with the 
hyaluronidase negative subtype III-2, described by Bohnsack et al., 2001. Our 
serosubtype III-2 also expresses Rib protein and contains IS86* and GBSil and 
probably corresponds with subtype 111-3 of Bohnsack et al., 2001. Serosubtypes 
IH-3 and III-4 were represented by relatively few isolates. The former (in common 
30 with some serotype la isolates) expressed the C alpha-like protein 2 and 
contained no mobile elements (an otherwise uncommon finding). The latter is 
closely related to serotype II, with which it shares sequence homology in a 
section of the cps gene cluster and various surface protein profiles and mobile 
elements. 
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Summary 

Our aim has been to develop a comprehensive genotyping system for group B 
streptococcus (GBS). Such a system should ideally be reproducible, objective and 
transportable between laboratories, comparable with and complementary to other 

5 typing methods and able to incorporate known virulence markers. Based on these 
criteria, we first developed a molecular serotyping (MS) method based on the cps 
gene cluster. It compared favourably with, but was more sensitive than, conventional 
serotyping (CS) and allowed us to identify several subtypes of serotype (sst) III, as 
described by others. We have also developed a second molecular subtyping method 

10 based on the family of genes encoding variable surface protein antigens 
(bca/rib/alp2/alp3/a!p4) and the IgA binding protein C beta (bac), is more sensitive 
and objective than conventional protein serotyping, which cannot type all isolates and 
is sometimes misleading. Our methods also can identify more members of the family 
of variable antigen genes and distinguish numerous bac subgroups. A third 

15 subtyping method uses five mobile genetic elements (mge) including four different 
insertion sequences (IS) and a type II intron, which have been identified in GBS. The 
use of this third method further enhances the discriminatory ability of our genotyping 
system. 

We then used our typing system to examine the population genetic structure 
20 and age-related disease distribution of genotypes among 194 invasive GBS isolates. 

We used mainly invasive GBS isolates to demonstrate the practical value of 
our genotyping system, confirm their clonal population structure and determine the 
distribution of genotypes in different patient groups. The isolates originated from 
patients of all ages with GBS sepsis. About half were consecutive GBS isolates from 
25 • blood or CSF, at a large diagnostic laboratory in a general adult hospital, with an 
obstetric unit (i.e there were no isolates from children other than neonates). The rest 
were consecutive isolates referred for serotyping from all over New Zealand. Thus the 
overall age distribution is representative of that in the population affected by GBS 
disease, except that children beyond the early neonatal period are probably under- 
30 represented. However, the distribution of genotypes within each age-group should be 
representative. 

Among our 194 Australasian invasive GBS isolates we identified 56 
genotypes, of which five (la-1 , lb-1 , 111-1 , W-2 and V-1 ) accounted for 62% of isolates. 
The phylogenetic tree derived from our results showed relationships between 
35 cps serotype and protein gene profiles (pgp). Our results also show that certain 
known virulence markers - C beta, C alpha variants and hyaluronidase production 
(indirectly) - were associated with distinct clonal lineages. 
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Our genotyping system, based on three sets of genetic markers, is highly 
discriminatory. Because it provides useful phenotypic data, including antigenic 
composition, it will be useful for epidemiological surveillance of GBS, especially in 
relation to potential GBS vaccine use. Study of the relationships between 
5 putative high-virulence genotypes and patient characteristics (age and/or 
underlying risk factors), and whether there are significant differences between 
CSF isolates (or genotypes) and other invasive or colonising strains, will be 
facilitated by our genotyping system. Using this system, we have demonstrated a 
clonal population structure among invasive Australasian GBS isolates. This 
10 system will be applied to colonising GBS isolates, to identify markers of virulence. 

Thus, we have developed an alternative to conventional serotyping for 
GBS, which is accurate and reproducible, can be performed by any laboratory 
with access to PCR/sequencing and, importantly, does not require panels of 
serotype-specific antisera that are increasingly difficult to maintain. All isolates 
15 are serotypable and sequencing of a relatively limited 790 bp region can provide 
additional serosubtyping information for MS III. The molecular methods we have 
described for serotype identification, together with the protein profiling (or protein 
antigen subtyping) and identification of mobile genetic elements (or mobile 
genetic elements subtyping) provide potentially useful markers for further 
20 phylogenetic and epidemiological studies of GBS as well as comprehensive strain 
identification that will be useful for epidemiological and other related studies that 
will be needed to monitor GBS isolates before and after introduction of GBS 
conjugate vaccines. 

The various features and embodiments of the present, referred to in 
25 individual sections above apply, as appropriate, to other sections, mutatis 
mutandis. Consequently features specified in one section may be combined with 
features specified in other sections, as appropriate. 

All publications mentioned in the above specification are herein 
incorporated by reference. Various modifications and variations of the described 
30 methods and system of the invention will be apparent to those skilled in the art 
without departing from the scope and spirit of the invention. Although the 
invention has been described in connection with specific preferred embodiments, 
it should be understood that the invention as claimed should not be unduly limited 
to such specific embodiments. Indeed, various modifications of the described 
35 modes for carrying out the invention which are readily apparent to those skilled in 
molecular biology or related fields are intended to be within the scope of the 
following claims. 
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Table 1. GBS reference panels used in this study. 
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Notes. 

1. Reference panel 1: supplied by Dr Lawrence Paoletti, Channing Laboratory, 
Boston, USA. 

2. Reference panel 2: New Zealand Reference Medical Culture Collection strains 
supplied by Dr Diana Martin, ESR, Porirua, Wellington, New Zealand. 

3. MS III serosubtypes based on sequence heterogeneity; see text for more 
detail 
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Table 3. Specificity and expected lengths of ampiicons of using different 
oligonucleotide primer pairs. 



Primer pairs* 


Specificity 


Length of ampiicons (base 
pairs) 


Sag59/Sag190 a 


GBS (S. agalactiae) 




196 


CFBS/CFBA 


GBS (S. agalactiae) 




241 


16SS/23SA 


GBS (S. agalactiae) 




433 


DSF2/DSR1 a 


GBS (S. agalactiae) 




276 


cpsDS/cpsEA1 


serotypes la to VII 




449/458 


cpsES/cpsEA2 


serotypes la to VII 




424 


cpsES1/cpsEA3 


serotypes la to VII 




505 


cpsES2/cpsEFA 


serotypes la to VII 




515 


cpsES3/cpsFA b 


serotypes la to VII 




450 


cpsFS/cpsGA1 b 


serotypes la to VII 




423 


cpsES3/cpsGAl b 


serotypes la to VII 




790 


cpsGS/cpsIA 


serotypes la and III 




1672/1558 


cpsGS1/cpslA 


serotypes la and III 




1662/1548 


cpsGS/lacpsHA1 


serotype la 




1127 


cpsGS1/lacpsHA1 


serotype la 




1117 


lacpsHS/lacpsHA 


serotype la 




296 


lacpsHS/lacpsHAI 


serotype la 




574 


lacpsHS1/cpslA c 


serotype la 




354 


cpsGS/lbcpsHA1 


serotype lb 




1468 


cpsGS1/lbcpsHA1 


serotype lb 




1458 


cpsGS/lbcpsIA 


serotype Jb 




1660 


cpsGSI/lbcpsIA 


serotype lb 




1650 


IbcpsHS/lbcpsHA 


serotype lb 




282 


IbcpsHSl/IbcpsHAI 


serotype lb 




349 


lbcpsHS2/lbcpslA 


serotype lb 




347 


lbcpslS/lbcpslA1 c 


serotype lb 




523 


cpsGS/lllcpsHA 


serotype III 




1063 


cpsGSI/lllcpsHA 


serotype III 




1053 


IIMcpsHS/lllcpsHA 


serotype III 




543 


HlcpsHS/cpslA c 


serotype III 




641 


cpsGS/IVcpsHA 


serotype IV 




1372 


cpsGSI/IVcpsHA 


serotype IV 




1362 


cpsGS/IVcpsMA 


serotype IV 




1686 
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cpsGSI/IVcpsMA 
IVcpsHS/IVcpsHA 
IVcpsHS1/IVcpsMA c 
cpsGS/VcpsHA1 
cpsGSWcpsHAI 
cpsGS/VcpsMA 
CpsGSWcpsMA 
VcpsHSWcpsHA 
VcpsHS1A/cpsHA1 
VcpsHS2/VcpsMA c 
IIIVIcpsHSI/VlcpsHA 
cpsGSMcpsHAI 
■cpsGSWIcpsHAI 
cpsGSA/lcpsIA 
cpsGSWIcpsIA 
VlcpsHSA/lcpsHA1 c 
VIcpsHSWIcpsIA 



serotype IV 
serotype IV 
serotype IV 
serotype V 
serotype V 
serotype V 
serotype V 
serotype V 
serotype V 
serotype V 
serotype VI 
serotype VI 
serotype VI 
serotype VI 
serotype VI 
serotype VI 
serotype VI 



1676 

400 

379 

1096 

1086 

1682 

1672 

349 

401 

374 

398 

1205 

1195 

1527 

1517 

327 

360 



Notes. 

*See Table 2 for primer sequences and Figure 1 for some primer sites. 
Primers used in Algorithm for molecular serotype identification-Figure 2 
a. to identify GBS, b. for sequencing, c. for MS-specrfic PCR 
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Table 5. Comparison of the results of conventional serotyping (CS) and 
molecular serotype identification (MS)/subtyping of 206 clinical GBS 
isolates. 



MS/serosubtype 



CS 


la 


lb 


II 


III-1 1 


III-2 1 


III-3 1 


III-4 1 


IV 


V 


VI 


VIII 


la 


38 






















lb 




30 




















II 






25 


















■■1 










20 


4 


3 










IV 
















7 








V 


















31 






VI 




















2 




VIII 






















1 


NT 1 


2 


5 


1 


3 


1 








5 


1 




Total (206) 2 


40 


35 


26 2 


30 


21 2 


4 


3 


7 


36 


3 


1 



Notes. 

1 . For details of MS III serosubtypes see text. 

2. One mixed culture was included as two separate isolates (one serotype II, one 
subtype III-2). 
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Table 7. Specificity and expected lengths of amplicons of using different 
primer pairs. 



Primer pairs* Specificity Length of Protein profile 

amplicons code 



(base pairs) 


IgAagGBS/ 


bac 


532-838 


B 


RlaAaaGBS * 








lgAS1/lgAA1 


bac 


303-691 


B 


GBS1360S/ 


bac 


652 


B 


GBS1937A 








GBS1717S/ 


bac 


292 


B 


GBS1937A 








bcaS1/bcaA 


5'-end of bca 


390 


A 


bcaS2/bcaA 


5'-end of oca 


342 


A 


DCaKUo/DCaKUA 


oca repeiuive uniu 


zoo 


eudo 




oca repeimve unii— iiKe 








region 






bcaS1/balA 


alp2/alp3 


446 


aipz or aipo 


bcaS2/balA 


alp2/alp3 


398 


alp2 or alp3 


balS/balA 


alp2/alp3 


302 


alp2 or alp3 


bal23S1/bal2A1 


alp2 


334 


alp2 


bal23S2/bal2A1 


alp2 


253 


alp2 


bal23S1/bal2A2 


alp2 


426 


alp2 


bal23S2/bal2A2 


alp2 


345 


alp2 


bal23S1/bal3A 


a/p3 


321 


alp3 


bal23S2/bal3A 


a/p3 


240 


alp3 


#ribS1/ribA3 


rib/rib-like 


355 


R/r 


ribS2/ribA1 


rib 


194 


R 


ribS2/ribA2 


rib 


225 


R 


ribS2/ribA3 


rib 


333 


R 



Notes. 

*See Table 6 for primer sequences. 

#For sequencing use only, not entirely specific for rib gene (see text for more 
detail). 
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Table 8. Genetic groups and subgroups of bac gene (C beta protein gene) 
based on amplicon length (using primers IgAagGBS/RlgAagGBS) and 
sequence heterogeneity. 



Group or 


N= 


Amplicon 


GenBank 


No. of different 


Molecular 


Subgroup 




length 


accession 


sites compared 


serotype/ 








numbers 


with (c.f.) main 


serosubtypes 










group 




B1 


19 


532 


X58470 




17 = lb; 2 = II 


B1a 


1 


532 


AF362686 


1 (c.f. B1) 


lb 


B2 


3 


550 


AF362687 




lb, II, III-4 


B3 


2 


586 


AF362688 




2=lb 


B3a 


1 


586 


AF362689 


4 (c.f. B3) 


V 


B3b 


1 


586 


AF362690 


21 (c.f. B3) 


VI 


B3c 


1 


586 


AF362691 


24 (c.f. B3) 


lb 


B4 


8 


604 


AF362692 




4= lb; 4= II 


B4a 


1 


604 


AF362693 


1 (c.f. B4) 


II 


B4b 


2 


604 


AF362694 


2 (c.f. B4) 


2 = lb 


B5 


2 


622 


AF362695 




la, VI 


B5a 


1 


622 


AF362696 


2 (c.f. B5) 


la 


B6 


1 


640 


AF362697 




lb 


B7 


1 


658 


AF362698 




lb 


B7a 


1 


658 


AF362699 


34 (c.f. B7) 


VI 


B8 


1 


712 


AF362700 




lb 


B9 


2 


748 


AF362701 




2= II 


B9a 


1 


748 


AF362702 


13(c.f. B9) 


lb 


B10 


2 


820 


AF362703 




2= lb 


B11 


1 


838 


AF362704 




lb 



Note. 

*See Table 9 for further details of serotype/serosubtype relationships with protein 
antigens. 
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Table 9. The relationship between GBS protein gene profiles and capsular 
polysaccharide (cps) molecular serotypes/serosubtypes. 



Serotype/ 


N= 


None 


Aa 


AaB 


R 


alp 


a 


as 


alp2as 


RB 


R 


serosubtype 
* 












3 










a 


la 


43 






2 






35 


3 


3 


' _ 




lb 


37 




1 


35 




1 












II 


29 




3 


10 


8 


2 


5 








1 


IIM 


30 








30 








„ 




M 


111-2 


22 








22 














111-3 


5 
















5 






111-4 


3 






1 




1 






1 






IV 


9 








1 




8 










V 


38 


1 






1 


35 








1 




VI 


5 




1 


3 






1 










VII 


1 










1 












VIII 


2 


1 








1 












Total 


224 


2 


5 


51 


62 


41 


49 


3 


9 


1 


1 



Note. 

*See text for explanation of cps serosubtypes and Table 7 for explanation of 
protein antigen gene profile codes. 
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Table 10. Oligonucleotide primers used in this study. 



PCT/AU02/01281 



Primer Target Tm°C 1 GenBank Sequence 2 

accession 

numbers _ 



IS861S 


\S861 


77.4 


M22449 


445GAG AAA ACA AGA GGG 
AGA CCG AGT AAA ATG GGA 
CG479 


IS861A1 


\S861 


77.3 


M22449 


831 CAC GAT TTC GCA GTT 
CTA AAT AAA TCC GAC GAT 
AGC C795 


IS861A2 


\S861 


76.1 


M22449 


1020CAA ACT CCG TCA CAT 
CGG TAT AGC ACT TCT CAT 
AGG985 


IS1648S 


IS1548 


76.5 


Y14270 


143CTA TTG ATG ATT GCG 
CAG TTG AAT TGG ATA GTC 
GTC178 


IS1548S1 


\S1548 


77.0 


Y14270 


539GTT TGG GAC AGG TAG 
CGG TTG AGG AGA AAA GTA 
ATG574 


IS1548A1 


\S1548 


77.0 


Y14270 


574CAT TAC TTT TCT CCT 
CAA CCG CTA CCT GTC CCA 
AAC539 


IS1548A2 


IS 1 548 


70.3 


Y14270 


915CCC AAT ACC ACG TAA 
CTT ATG CCA TTT G888 


IS1548A3 


\S1548 


78.0 


Y14270 


930CGT GTT ACG AGT CAT 
CCC AAT ACC ACG TAA CTT 
ATG CC893 


IS1381S1 


\S1381 


80.1 


AF064785/ 
AF367974 


272/81 8CTT ATG AAC AAA 
TTG CGG CTG ATT TTG GCA 
TTC ACG307/853 


IS1381S2 


\S1381 


81.7 


AF064785/ 
AF367974 


497/1 040GGC TCA GGC GAT 
TGT CAC AAG CCA AGG 
GAG526/1069 
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IS1381A 


\S1381 


73.1 


AF064785/ 
AF367974 


881/1 424CTA AAA TCC TAG 
TTC ACG GTT GAT CAT TCC 
AGC849/1392 


ISSa4S 


ISSa4 


78.5 


AF1 65983 


326CGT ATC TGT CAC TTA 
TTT CCC TGC GGG TGT CTC 
C359 


ISSa4A1 


ISSa4 


75.2 


AF1 65983 


639GCC GAT GTC ACA ACA 
TAG TTC AGG ATA TAG CCA 
G606 


ISSa4A2 


ISSa4 


74.5 


AF1 65983 


780CGT AAA GGA GTC CAA 
AGA TGA TAG CCT TTT TGA 
ACC745 


GBSHS1 


GBSil 


78.6 


AJ292930 


721 CAT CTC GGA ACA ATA 
TGC TCG AAG CTT ACA AGC 
AAG TG758 


GBSMS2 


GBSil 


77.3 


AJ292930 


789GGG GTC ACT ATC GAG 
CAG ATG GAT GAC TAT CTT 
CAC824 


bboliAl 


ODOl 1 


fro q 

oo.a 


A 1292930 


1058AAT GGC TGT TTC GCA 
GGA GCG ATT GGG TCT GAA 
CC1024 


GBSi1A2 


GBSil 


80.5 


AJ292930 


1 1 61 CCA GGG ACA TCA ATC 
TGT CTT GCG GAA CAG TAT 
CG1127 



Notes. 

1 . The primer Tm values were provided by the primer synthesiser (Sigma-Aldrich). 

2. Numbers represent the numbered base positions at which primer sequences 
start and finish (numbering start point "1" refers to the start point "1" of 
corresponding gene GenBank accession number). 
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Table 11. Specificity and expected lengths of amplicons of using different 
oligonucleotide primer pairs. 



Primer pairs* 


Specificity 


Length of amplicons (base 
pairs) 


IS861S/IS861A1 


\S861 


387 


IS861S/IS861A2 


\S861 


576 


IS1548S/IS1548A1 


\S1548 


432 


IS1548S/IS1548A2 


\S1548 


773 


IS1548S/IS1548A3 


IS 7548 


788 


IS1548S1/IS1548A2 


\S1548 


377 


IS1548S1/IS1548A3 


\S1548 


392 


IS1381S1/IS1381A 


\S1381 


610/607* 


IS1381S2/IS1381A 


\S1381 


385 


ISSa4S/ISSa4A1 


ISSal 


314 


ISSa4S/ISSa4A2 


ISSa4 


455 


GBSi1S1/GBSi1A1 


GBSM 


338 


GBSi1S1/GBSi1A2 


GBSil 


441 


GBSil S2/GBSi1A1 


GBSil 


270 


GBSi1S2/GBSi1A2 


GBSil 


373 



Notes. 

*See table 10 for primer sequences. 

# Our sequencing result (GenBank accession number AF367974) was 3 bp 
shorter than that previously described by Tamura et al., 2000 (GenBank accession 
number: AF064785). 



WO 03/025216 



PCT/AU02/01281 



Table 12. Relationship between mobile genetic elements and capsular 
polysaccharide serotypes, serotype III subtypes and surface protein gene 
profiles. 



Serotype/ 
serosubtype 


Protein 
gene 
profile 


N= 


IS861 


\S1548 


IS1381 


ISSa 
4 


GBSil 


No 

mobile 
element 


la 


AaB 


2 


2 




2 




— 




la 


alp2as 


3 




— 


m 




_ 


3 


la 


a 


35 


3 


1 


35 


1 






la 


as 


3 






3 




— 




subtotal 




43 


5 


1 


40 


1 


— 


3 


lb 


Aa 


1 










— 


1 


lb 


AaB 


35 


30 




35 


1 


— 




lb 


alp3 


1 




m 


1 








subtotal 




37 


30 


m 


36 


1 


m 


1 


II 


Aa 


3 


3 


1 


3 


2 


1 




II 


AaB 


10 


10 


5 


10 


5 


1 


- 


II 


alp3 


2 


1 


1 


2 




- 


- 


II 


R 


8 


8 




8 




8 


_ 


II 


Ra 


1 


1 


m 






1 




II 


a 


5 


2 


2 


5 


3 


5 




subtotal 




29 


25 


9 


28 


10 


16 


• 


IIM 


R 


30 


30 


30 


30 


1 


. 


- 


III-2 


R 


22 


22 




— 


— 


22 


• 


III-3 


alp2as 


5 








— 


„ 


5 


III-4 


AaB 


1 


1 




1 


— 


1 




111-4 


alp2as 


1 




— 




_ 


1 


• 


111-4 


alp3 


1 


m 




1 




1 




subtotal 




60 


53 


30 


32 


1 


25 


5 


IV 


R 


1 


1 




1 




1 






a 


8 


2 




8 








subtotal 




9 


3 




9 




f 




V 


alp3 


35 


3 


1 


35 


1 


1 




V 


R 


1 


1 




1 


1 






V 


RB 


1 


1 




1 








V 


none 


1 












1 


subtotal 




38 


5 


1 


37 


1 


f 


2 
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VI 


Aa 


1 


- 


1 




AaB 


3 


3 


3 




a 


1 




1 


subtotal 




5 


3 


5 


VII 


alp3 


1 




1 


VIII 


alp3 


1 




1 - - - 




none 


1 




1 - - - 


subtotal 




2 


m m 


2 


Total 




224 


124 41 (18) 


190 15(7) 43(19) 10(4) 








(55) 





Note. 

A: 5-end of bca gene (C alpha protein); 

a: bca gene repetitive unit or bca gene repetitive unit-like sequence (multiple band 
amplicon); 

as: bca gene repetitive unit or bca gene repetitive unit-like sequence (single band 
amplicon); 

B: C beta/lgA binding protein (bac) gene. 
R: Rib protein {rib) gene; 
alp2: C alpha-like protein 2 (a/p2) gene; 
alp3: C alpha-like protein 3 (a/p3) gene; 
n assumed Rib-like protein gene. 
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Table 13. Distribution of mobile genetic elements among 194 invasive 
GBS isolates. 



Mobile genetic elements present 



Total N 55 


to T » p f 


TO Ojf j 


TC 7 C>/0 

l5iJ4o 


TCP/,// 


UX>MI 


None 


o 




— 


— 


— 


— 




78 


78 


— 


— 


— 




— 


2 






— 


— 




— 


OT 

37 


37 


in 

37 




— 


— 


— 


i 
1 


1 


— 


1 




— 


— 


3 


3 






3 






29 


29 


29 


29 








6 


6 


6 




6 






8 


8 


8 






8 




18 




18 






18 




1 


1 








1 




1 


1 




1 




1 




2 


2 


2 


2 




2 




2 


2 






2 


2 




Total 


168 (87%) 


100 (52%) 


33 (17%) 


11(6%) 


34 (18%) 


6 (3%) 


(n=194) 















Note. 

Data are numbers of isolates containing various combinations of mge 
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Relationship between GBS genotypes and Invasive disease age. 



Serotype Age-group/disease 1 
Genotype 





0-6(1 


7-3m 


4m -14yr 


15-45 yr 


46-60 yr 


>60 yr 


Total 


Ia-l 


14 


4+1 


1 


7 


3 


6 


35+1 (19%) 


Ia-(2-8) 


4 


2 


- 


1 


- 


3 


10 


la total 


18 04%) 


6+1 (21%) 


1 (10%) 


8 (28%) 


3 (18%) 


9 (17%) 


45+1 (24%) 


Ib-1 


2 


1+1 


- 


3 


2 


5+1 


13+2 


Ib-(2-16) 


3 


4+2 


- 


3 


1 


5 


16+2 


lb total 


S (9.4%) 


5+3 (24%) 


- 


6 (21%) 


3 


10+1 


29+4 (17%) 


n 


8 (15%) 


1(3%) 


- 


4+1 (17%) 


1 


4 (7%) 


18+1 (10%) 


m-i 


6+1(13%) 


4(12%) 


1+1 (20%) 


1+1(7%) 


6+1 (41%) 


4 


22+4(13%) 


m-2 


5 (9%) 


5+4 (39%) s 


1 (10%) 


2 


- 


- 


13+4(9%) 


m-<?4) 


1+1 


1 




1 


1 


1 


5+1 


m total 


12+2 (26%) 


10+4 (41%) 


2+1 (30%) 


4+1 (17%) 


7+1 (44%) 


5 (9%) 


40+9 (25%) 


IV total 


3 










4 


7 (4%) 


V-l 


3 


3 


2 


4 


2 


13+1 


27+1 (14%) 


V-(2-7) 


1 


1 




1 




4 


7 


. V total 


4 (8%) 


4 (12%) 


2 (20%) 


5(17%) 


2 (11%) 


17+1 (33%) 4 


34+1 (18%) 


VI total 


1 








+1 


3 


4+1 (3%) 


TOTAL 


51+2=53 


26+8=34 


5+2=7 


27+1=29 


16+2=18 


52+2=54 


177+17=194 



Notes: 

1 . Numbers after "+" refer to CSF isolates; all others are from blood. 

2. Five aged 4m-1yr and one case was aged 3 yr. 

3. Sst III-2 in late onset infection compared with all other groups: p=0.0005, odds 
ratio (OR) 6.8; 95% confidence interval (CI) 2.4-19.4. 

MS-V in elderly compared with all other age-groups: p=0.001, OR 0.28; 95% CI 
0.13-0.59). 
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CLAIMS 

1. A method of typing a group B streptococcal bacterium which method 
comprises analysing the nucleotide sequence of one or more regions within the 
cpsD, cpsE, cpsF, cpsG and/or cpsl/M genes of said bacterium, said region(s) 
comprising one or more nucleotides whose sequence varies between types. 

2„ A method according to claim 1 wherein the nucleotide sequence is analysed 
for one or more positions corresponding to positions 62, 78-86, 138, 139, 144, 198, 
204, 211, 281, 240, 249, 300, 321. 419, 429, 437, 457, 466, 486, 602, 606, 627, 
636, 645, 803, 971, 1026, 1044, 1173, 1194, 1251, 1278, 1413, 1495, 1500, 1501, 
1512, 1518, 1527, 1595, 1611, 1620, 1627, 1629, 1655, 1832, 1856, 1866, 1871, 
1892, 1971, 2026, 2088, 2134, 2187 and 2196 as shown in Figure 1. 

3. A method according to claim 1 wherein at least one region is within a 
sequence delineated by the 3' 136 bases of the cpsE gene and the 5' 218 bases of 
the cpsG gene of the cpsE-cpsF-cspG gene cluster of said streptococcal 
bacterium. 

4. A method according to claim 3 wherein the nucleotide sequence is analysed 
for one or more positions corresponding to positions 1413, 1495, 1500, 1501, 
1512, 1518, 1527, 1595, 1611, 1620, 1627, 1629, 1655, 1832, 1856, 1866, 1871, 
1892, 1971, 2026, 2088, 2134, 2187 and 2196 as shown in Figure 1. 

5. A method according to any one of claims 1 to 4 wherein at least one region 
is within the cpsl/M genes of said bacterium. 

6. A method according to any one of claims 1 to 5 wherein the nucleotide 
sequence analysis step comprises sequencing said one or more regions. 

7. A method according to any one of claims 1 to 5 wherein the nucleotide 
sequence analysis step comprises determining whether a polynucleotide obtained 
from said bacterium selectively hybridises to a polynucleotide probe comprising 
one or more of the said regions. 

8. A method according to claim 7 which comprises determining whether the 
polynucleotide obtained from said bacterium hybridises to one or more of a 
plurality of polynucleotide probes corresponding to one or more of the said regions. 
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9. A method according to claim 9 wherein the plurality of polynucleotide probes 
are present as a microarray. 

10. A method according to any one of claims 1 to 5 wherein the nucleotide 
sequence analysis step comprises an amplification step using one or more 
primers, at least one of which hybridises specifically to a sequence which differs 
between types. 

11. A method according to any one of claims 1 to 6 wherein the nucleotide 
sequence analysis step comprises an amplification step using primer pairs, at least 
one of which hybridise specifically to a sequence which differs between types. 

12. A method according to claim 10 or claim 11 wherein said primers are 
selected from the primers shown in Table 2. 

13. A method of typing a group B streptococcal bacterium which method 
comprises determining the presence or absence in the genome of said bacterium 
of one or more surface protein genes selected from rib, alp2 or a/p3 genes. 

14. A method according to claim 13 wherein determining the presence or 
absence of said surface protein genes comprises determining whether a 
polynucleotide obtained from said bacterium selectively hybridises to a 
polynucleotide probe corresponding to a region of said surface protein genes. 

15. A method according to any one of claim 13 wherein determining the 
presence or absence of said surface protein genes comprises an amplification step 
using one or more primers which amplify specifically a region of said surface 
protein genes. 

16. A method according to claim 15 wherein said primers are selected from the 
primers shown in Table 6. 

17. A method according to any one of claims 1 to 12 which further comprises 
determining the presence or absence of in the genome of said bacterium of one or 
more surface protein genes selected from rib, alp2 or alp3 genes. 
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18. A method of typing a group B streptococcal bacterium which method 
comprises determining the presence or absence in the genome of said bacterium 
of one or more mobile genetic elements selected from \S861, \S1548 9 \S1381 t 
ISSa4andGBSi1. 

19. A method according to claim 18 wherein determining the presence or 
absence of said mobile genetic elements comprises determining whether a 
polynucleotide obtained from said bacterium selectively hybridises to a 
polynucleotide probe corresponding to a region of said mobile genetic elements. 

20. A method according to any one of claim 18 wherein determining the 
presence or absence of said mobile genetic elements comprises an amplification 
step using one or more primers which amplify specifically a region of said mobile 
genetic elements. 

21. A method according to claim 20 wherein said primers are selected from the 
primers shown in Table 10. 

22. A method according to any one of claims 13 to 17 which further comprises 
determining the presence or absence in the genome of said bacterium of one or 
more mobile genetic elements selected from \$861, \S1548, , \S1381 } ISSa4 and 
GBSil. 

23. A polynucleotide consisting essentially of at least 10 contiguous nucleotides 
corresponding to a region within a cpsD-cpsE-cpsF-cpsG gene of a group B 
streptococcal bacterium, said polynucleotide comprising one or more nucleotides 
which differ between group B streptococcal serotypes. 

24. A polynucleotide according to claim 23 wherein said nucleotides which differ 
between group B streptococcal serotypes correspond to one or more of positions 
62, 78-86, 138, 139, 144, 198, 204, 211, 281, 240, 249, 300, 321, 419, 429, 437, 
457, 466, 486, 602, 606, 627, 636, 645, 803, 971, 1026, 1044, 1173, 1194, 1251, 
1278, 1413, 1495, 1500, 1501, 1512, 1518, 1527, 1595, 1611, 1620, 1627, 1629, 
1655, 1832, 1856, 1866, 1871, 1892, 1971, 2026, 2088, 2134, 2187 and 2196 as 
shown in Figure 1 . . 

25. A polynucleotide consisting essentially of at least 10 contiguous nucleotides 
corresponding to a region within a sequence delineated by the 3 f 136 base pairs of 
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cpsE and the 5' 218 base pairs of cpsG of the cpsE-cpsF-cspG gene cluster of a 
group B streptococcal bacterium, said polynucleotide comprising one or more 
nucleotides which differ between group B streptococcal types. 

26. A polynucleotide according to claim 25 wherein said nucleotides which differ 
between group B streptococcal types correspond to one or more of positions 1413, 
1495, 1500, 1501, 1512, 1518, 1527, 1595, 1611, 1620, 1627, 1629, 1655, 1832, 
1856, 1866, 1871, 1892, 1971, 2026, 2088, 2134, 2187 and 2196 as shown in 
Figure 1. 

27. A polynucleotide consisting essentially of at least 10 contiguous nucleotides 
corresponding to a region within a cpsl/M gene of a group B streptococcal 
bacterium, said polynucleotide comprising one or more nucleotides which differ 
between streptococcal serotypes. 

28. A polynucleotide according to claim 27 wherein the polynucleotide is 
selected from the nucleotide sequences shown in Table 2. 

29. A polynucleotide consisting essentially of at least 10 contiguous nucleotides 
corresponding to a region within a rib, alp2 or alp3 gene of a group B streptococcal 
bacterium, said polynucleotide comprising one or more nucleotides which differ 
between group B streptococcal subtypes. 

30. A polynucleotide according to claim 29 wherein the polynucleotide is 
selected from the nucleotide sequences shown in Table 6. 

31 . Use of a polynucleotide according to any one of claims 23 to 30 in a method 
of serotyping and/or subtyping a group B streptococcal bacterium. 

32. A composition comprising a plurality of polynucleotides according to any 
one of claims 23 to 30. 

33. Use of a composition according to claim 32 in a method of serotyping and/or 
subtyping a group B streptococcal bacterium. 

34. A microarray comprising a plurality of polynucleotides according to any one 
of claims 23 to 30. 
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35. Use of a microarray according to claim 34 in a method of serotyping and/or 
subtyping a group B streptococcal bacterium. 
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Figure 1. Multiple sequence alignments of the regions of the 3' end of cpsD-cpsE-cpsF-and 
the 5' end of cpsG for reference strains of serotypes la to VIL 
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Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



GCAAAAGAAC A6ATGGAACA AAGTGGT TCA AAGTTCTTAG GTATTATTCT 
cpsDS 

51 ~ 100 



i • • • • 



-g- 
-g- 



-g- 



TAATAAAGTT AATGAATCTG TTGCTACTTA CGGCGATTAC GGCGATTATG 



101 



150 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 



GAAATTACGG AAAAAGGGAT AGAAAAAGGA AGTAAGGGGC TCTTGTATTG 

cpsD | 

151 200 
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Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
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_ M- 



AAAGAAAAAG AAAATATACA AAAGATTATT ATAGCGATGA TTCAAACAGT 

| cpsE • 

201 250 

a : 

g 1 C 

— c- 



a- 



TGTGGTTTAT TTTTCTGCAA GTTTGACATT AACATTAATT ACTCCCAATT 
251 300' 



TTAAAAGCAA TAAAGATTTA TTGTTTGTTC TATTGATACA TTATATTGTC 
301 350 



TTTTATCTTT CTGATTTTTA CAGAG ACTTT TGGAGTCGTG GCTATCTTGA 

cpsES 

351 400 
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Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



AGAGTTTAAA ATGGTATTGA AATACAGCTT TTACTATATT TTCATATCAA 



401 450 

Serosubtype III-2 

Serotype VI 

Serotype lb c 

Serotype II/III-4 a ■ 

Serotype VII a 1 

Serosubtype III-3 

Serosubtype Ia-1 a- 

Serosubtype III-l 

Serotype IV 

Serotype V 

Serosubtype Ia-2 a 

Consensus GTTCATTATT TTTTATTTTT AAAAACTCTT TT ACAACGAC ACGACTTTCC 

cpsEAl 

451 500 

Serosubtype III-2 

Serotype VI 

Serotype lb 

Serotype II/III-4 c 

Serotype VII c 

Serosubtype III-3 g 

Serosubtype Ia-1 1 g 

Serosubtype III-l — 

Serotype IV a 

Serotype V 

Serosubtype Ia-2 1 g 

Consensus TTTTTTAC TT TTATTGCTAT GAATTCGATT TTATTATATC TATTGAATTC 

501 550 

Serosubtype III-2 

Serotype VI 

Serotype lb — 1 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV 

Serotype V ; 

Serosubtype Ia-2 

Consensus ATTTTTAAAA TATTATCGAA AATATTCTTA CGCTAAGTTT TCACGAGATA 

551 600 

Serosubtype III-2 

Serotype VI 

Serotype lb 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV 
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Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
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CCAAAGTTGT TTTGATAACG AATAAGGATT 


CTTTATCAAA AATGACCTTT 


601 


650 


























AGGAATAAAT ACGACCATAA TTATATCGCT 


GTCTGTATCT TGGACTCCTC 


651 


700 



TGAAAAGGAT TG TTATGATT TGAAACATAA CTCGTTAAGG ATAATAAACA 
cpsESl 

701 750 



: K — - 

AAGATGCTCT TACTTCAGAG TTA ACCTGCT TAACTGTTGA TCAAGCTTTT 

cpsEA2 

751 800 
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Serosubtype Ia-2 

Consensus ATTAACATAC CCATTGAATT ATTTGGTAAA TACCAAATAC AAGATATTAT 

801 850 

Serosubtype III-2 

Serotype VI — t 

Serotype lb 

Serotype II/III-4 

Serotype VII . 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV 

Serotype V 

Serosubtype Ia-2 

Consensus TAATGACATT GAAGCAATGG GAGTGATTGT CAATGTTAAT GTAGAGGCAC 

851 900 

Serosubtype III-2 

Serotype VI 

Serotype lb 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV 

Serotype V 

Serosubtype Ia-2 

Consensus TTAGCTTTGA TAATATAGGA GAAAAGCGAA TCCAAACTTT TGAAGGATAT 



901 950 

Serosubtype III-2 

Serotype VI 

Serotype lb 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV 

Serotype V 

Serosubtype Ia-2 . 

Consensus AGTGTTATTA CATATTCTAT GAAATTCTAT AAATATAGTC ACCTTATAGC 



951 1000 

Serosubtype III-2 

Serotype VI t 

Serotype lb t 

Serotype II/III-4 t 

Serotype VTI t 

Serosubtype III-3 

Serosubtype Ia-1 : 

Serosubtype III-l 
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Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 



AAAACGATTT TTGGATATCA CGGGTGCTAT TATAGGTTTG CTCATATGTG 



1001 



1050 



-a- 
-a- 



-a- 
-a- 
-a- 



GCATTGTGGC AATTTTTCTA GTTCCGCAAA TCAGAAAAGA TGGTGGACCG 



1051 



1100 



GCTATCTTTT CTCAAAATAG AGTAGGTCGT AATGGTAGGA TTTTTAGATT 



cpsES2 
1101 



1150 



CTATAAATTC AGATCAATGC GAGTAGATGC AGAACAAATT AAGAAAGATT 



1151 



cpsEA3 



1200 



-a- 
-a- 



-a- 
-a- 
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Serotype V 

Serosubtype Ia-2 

Consensus TATTAGTTCA CAATCAAATG ACAGGGCTAA TGTTTAAGTT AGACGATGAT 

1201 1250 

Serosubtype III-2 

Serotype VI 

Serotype lb 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV — 

Serotype V 

Serosubtype Ia-2 

Consensus CCTAGAATTA CTAAAATAGG AAAATTTATT CGAAAAACAA GCATAGATGA 

1251 1300 

Serosubtype III-2 

Serotype VI a 

Serotype lb 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV 1 

Serotype V g 

Serosubtype Ia-2 

Consensus GTTGCCTCAA TTCTATAATG TTTTAAAAGG TGATATGAGT TTAGTAGGAA 



1301 1350 

Serosubtype III-2 

Serotype VI 

Serotype lb 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV 

Serotype V 

Serosubtype Ia-2 

Consensus CACGCCCTCC CACAGTTGAT GAATATGAAA AGTATAATTC AACGCAGAAG 



1351 1400 

Serosubtype III-2 

Serotype VI : 

Serotype lb 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l 
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Serotype IV : 

Serotype V _ 

Serosubtype Ia-2 - 

Consensus CGACGCCTTA GTTTTAAGCC AGGAATCACT GGTTTGTGGC AAATATCTGG 

1401 1450 

Serosubtype III-2 

Serotype VI „; 

Serotype lb — : 

Serotype II/III-4 

Serotype VII ; 

Serosubtype III-3 — c - 

Serosubtype Ia-1 — c — : i— ~ 

Serosubtype III-l — 

Serotype IV — „ 

Serotype V — 

Serosubtype Ia-2 — c - 

Consensus TAGAAATAAT ATTACTGATT TTGATGAAAT CGTAA AGTTA GATGTTCAAT 

1451 150 0 

Serosubtype III-2 . 

Serotype VI . a 

Serotype lb q 

Serotype II/HI-4 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l . 

Serotype IV 

Serotype V 

Serosubtype Ia-2 

Consensus ATATCAATGA ATGGTCTATT TGGTCAGA TA TTAAGATTAT TCTCCTAACA 
cpsES3 

1501 1550 

Serosubtype III-2 -t c 

Serotype VI -t — c — — 

Serotype lb -t — • c — 

Serotype II/III-4 t ■ 

Serotype VII t 

Serosubtype III-3 1 

Serosubtype Ia-1 - f 

Serosubtype III-l -t c — « - 

Serotype IV 1 ■ 

Serotype V -t c — 

Serosubtype Ia-2 1 

Consensus CTAAAGGTAG TCTTACTTGG GACAGG AGCT AAGTAAAGGT AAGGTTTGAA 

cpsE | cpsEFA 
1551 1600 

Serosubtype III-2 

Serotype VI — c 

Serotype lb — c 

Serotype II/III-4 — 

Serotype VII - — 

Serosubtype III-3 : 

•Serosubtype Ia-1 

Serosubtype III-l — 

Serotype IV 
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Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



AGGAATATAA TGAAAATTTG TCTGGTTGGT TCAAGTGGTG GTCATCTAGC 
I cpsF 

1601 1650 



ACACTTGAAC CTTTTGAAAC CCATTTGGGA AAAAGAAGAT AGGTTTTGGG 



1651 1700 

Serosubtype III-2 

Serotype VI — 

Serotype lb 1 • 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 

Serotype IV 

Serotype V 

Serosubtype Ia-2 — 

Consensus TAACCTTTGA TAAAGAAGAT GCTAGGAGTA TTCTAAGAGA AGAGATTGTA 



1701 1750 

Serosubtype III-2 

Serotype VI 

Serotype lb 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 — 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV 

Serotype V 

Serosubtype Ia-2 : 

Consensus TATCATTGCT TCTTTCCAAC AAACCGTAAT GTCAAAAACT TGGTAAAAAA 



1751 1800 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
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Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



TACTATTCTA GCTTTTAAGG TCCTTAGAAA AGAAAGACCA GATGTTATCA 



1801 



1850 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



TATCATCTGG TGCCGCTGTA GCAGTACCAT TCTTTTATAT TGGTAAGTTA 



1851 



cpsFS 



1900 



TTTGGTTGTA AGACCGTTTA TATAGAGGTT TTCGACA GGA TAGATAAACC 
cpsFA 

1901 1950 



AACTTTGACA GGAAAATTAG TGTATCCTGT AACAGATAAA TTTATTGTTC 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 



1951 



2000 
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Serotype IV 

Serotype V 

Serosubtype Ia-2 ; ™ 

Consensus AGTGGGAAGA AATGAAAAAA GTTTATCCTA AGGCAATTAA TTTAGGAGGA 

2001 2050 

Serosubtype III-2 

Serotype VI 

Serotype lb a 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV 

Serotype V 

Serosubtype Ia-2 

Consensus ATTTTTTAAT GATTTTTGTC ACAGTGGGGA CACATGAACA GCAGTTCAAC 
cpsF | cpsG 

2051 2100 

Serosubtype III-2 

Serotype VI 

Serotype lb 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV a — 

Serotype V 

Serosubtype Ia-2 

Consensus CGTCTTATTA AAGAAGTTGA TAGATTAAAA GGGACAGGTG CTATTGATCA 

2101 2150 

Serosubtype III-2 c 

Serotype VI 

Serotype lb 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 : 

Serosubtype Ia-1 

Serosubtype III-l c 

Serotype IV 

Serotype V 

Serosubtype Ia-2 

Consensus AGAAGTGTTC ATTCAAACGG GTTACTCAGA CTTT GAACCT CAGAATTGTC 

cpsGS 

2151 2200 

Serosubtype III-2 

Serotype VI 

Serotype lb 

Serotype II/III-4 

Serotype VII g g 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l 
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Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



AGTGGTCAAA ATTT CTCTCA TATGATGATA TGAACTCTTA CATGAAAGAA 

cpsGA2 



cpsGAl 



2201 2226 

Serosubtype III-2 ~ 

Serotype VI 

Serotype lb c 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV 

Serotype V 

Serosubtype Ia-2 ■ 

Consensus GCTGAGATTG TTATGACACA TGGCGG 
cpsGA3 



Notes. 

Numbering start point "1" refers to the start point "1" of GenBank accession number AF332908 (for 
serotype IV reference strain 3 139). 

Serosubtype Ia-1 : strain 090, GenBank accession number AF332893; 

Serosubtype Ia-2: strain NZRM 908(NCDC SS615), GenBank accession number AF332894; 

Serotype lb: strain H36B, GenBank accession number AF332903; 

Serotype MII-4: strain 18RS21, GenBank accession number AF332905; 

Serosubtype IH-1 : strain SG99/056, GenBank accession number AF332899; 

Serosubtype IH-2: strain M781, GenBank accession number AF332896; 

Serosubtype m-3: strain NZRM 912 (NCDC SS620), GenBank accession number AF332897; 

III-4 (Subtype m-4): strain SG96/220, GenBank accession number AF363036; 

Serotype IV: strain 3139, GenBank accession number AF332908; 

Serotype V: strain CJB 111, GenBank accession number AF332910; 

Serotype VI: strain SS1214, GenBank accession number AF332901; 

Serotype VII: strain 7271, GenBank accession number AF332913. 
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Figure 3. Multiple sequence alignments of the gene sequences of the cpsG-cpsH- 
cpsI/M for serotypes la, lb, n, m, IV, V and VI (start and stop codons were 
highlighted). 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VT 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



ATGATTTTTG 
ATGATTTTTG 
ATGATTTTTG 
ATGATTTTTG 
ATGATTTTTG 
ATGATTTTTG 
********** 

cpsG 
51 

TAAAGAAGTT 

TAAAGAAGTT 

TAAAGAAGTT 

TAAAGAAGTT 

TAAAGAAGTT 

TAAAGAAGTT 
********** 

101 

TCATTCAAAC 

TCATTCAAAC 

TCATTCAAAC 

TCATTCAAAC 

TCATTCAAAC 

TCATTCAAAC 
********** 

151 

AAATTTCTCT 

AAATTTCTCT 

AAATTTCTCT 

AAATTTCTCT 

AAATTTCTCT 

AAATTTCTCT 
********** 

201 

TGTTATCACA 

TGTTATCACA 

TGTTATCACA 

TGTTATCACA 

TGTTATCACA 

TGTTATCACA 
********** 



TCACAGTGGG 
TCACAGTGGG 
TCACAGTGGG 
TCACAGTAGG 
TCACAGTGGG 
TCACAGTGGG 



GACACATGAA 
GACACATGAA 
GACACATGAA 
GACACATGAA 
GACACATGAA 
GACACATGAA 



*******_** ********** 



CAGCAGTTCA 
CAGCAGTTCA 
CAGCAGTTCA 
CAGCAGTTCA 
CAGCAGTTCA 
CAGCAGTTCA 
********** 



GATAGATTAA 
GATAGATTAA 
GATAGATTAA 
GATAGATTAA 
GATAGATTAA 
GATAGATTAA 
********** 



GGGTTACTCA 
GGGTTACTCA 
GGGTTACTCA 
GGGTTACTCA 
GGGTTACTCA 
GGGTTACTCA 
********** 



CAT AT GAT GA 
CAT AT GAT GA 
CAT AT GAT GA 
CAT AT GAT GA 
CAT AT GAT GA 
CATATGATGA 
********** 



CATGGCGGTC 

CATGGCGGCC 

CATGGCGGTC 

CACGGCGGTC 

CATGGCGGCC 

CATGGCGGCC 
**_*****-* 



AAGGGACAGA 
AAGGGACAGG 
AAGGGACAGG 
AAGGGACAGG 
AAGGGACAGG 

AAGGGACAGG 
*********_ 



GACTTTGAAC 
GACTTTGAAC 
GACTTTGAAC 
GACTTTGAAC 
GACTTCGAAC 
GACTTTGAAC 
*****_**** 



TATGAACTCT 
TATGAACTCT 
TATGAACTCT 
TATGAACTCT 
TATGAACTCT 
TATGAACTCT 
********** 



CAGCGACGTT 
CAGCGACGTT 
CAGCGACGTT 
CAGCAACGTT 
CAGCGACGTT 

CAGCGACGTT 
****_***** 



TGCTATTGAT 

TGCTATTGAT 

TGCTATTGAT 

TGCTATTGAT 

TGCTATTGAT 

TGCTATTGAT 
********** 



CTCAGAATTG 
CTCAGAATTG 
CTCAGAATTG 
CTCAGAATTG 
CTCAGAATTG 
CTCAGAATTG 
********** 



T ACAT GAAAG 
T ACAT GAAAG 
T ACAT GAAAG 
T ACAT GAAAG 
T ACAT GAAAG 
T ACAT GAAAG 
********** 



TATGAATGCA 
TATGAATGCA 
TATGAATGCA 
TATGAATGCA 
TATGTCAGTT 
TATGTCAGTT 
**** * — 



50 

ACCGTCTTAT 
ACCGTCTTAT 
ACCGTCTTAT 
ACCGTCTTAT 
ACCGTCTTAT 
ACCGTCTTAT 
********** 

100 

CAAGAAGTGT 
CAAGAAGTGT 
CAAGAAGTGT 
CAAGAAGTGT 
CAAGAAGTGT 
CAAGAAGTGT 
********** 

150 

TCAGTGGTCA 
TCAGTGGTCA 
TCAGTGGTCA 
TCAGTGGTCA 
TCAGTGGTCA 
TCAGTGGTCA 
********** 

200 

AAGCTGAGAT 

AAGCTGAGAT 

AAGCTGAGAT 

AAGCTGAGAT 

AAGCTGAGAT 

AAGCTGAGAT 
********** 

250 

GTTTCTAAAG 

GTTTCTAAAG 

GTTTCTAAAG 

GTTTCTAAAG 

ATTTCTTTAG 

ATTTCTTTAG 
_*+*** ** 
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Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



251 

GGAAAAAAAC 
GAAAAAAAAC 
GGAAAAAAAC 
GGAAAAAAAC 
GGAAATTACC 
GGAAATTACC 
*_* 



TATTGTGGTT 
TATTGTGGTT 
TATTGTGGTT 
TATTGTGGTT 
AGTTGTTGTT 
AGTTGTTGTT 
•**+*_*** 



CCTAGACAAG 
CCTAGACAAG 
CCTAGACAAG 
CCTAGACAAG 
CCTAGGAGAA 
CCCAGGAGAA 
**_*+ +_ 



AACAGTTTGG 
AACAGTTTGG 
AACAGTTTGG 
AACAGTTTGG 
AGCAGTTTGG 
AGCAGTTTGG 
*_•***++*** 



300 

AGAGCATGTG 
AGAGCATGTG 
AGAGCATGTG 
AGAGCATGTG 
TGAACATATC 
TGAACATATC 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



301 

AATAATCATC 
AATAATCATC 
AATAATCATC 
AATAATCATC 
AATGATCATC 
AATGATCATC 



AGGTGGATTT 
AGGTGGACTT 
AGGTGGATTT 
AGGTGGATTT 
AAATACAATT 
AAATACAATT 



TGTTAATAAG 

TGTTAATAAG 

TTTGAAAGAG 

TTTGAAAGAG 

TTTAAAAAAA 

TTTAAATTCG 




GTAAAAACAA 
GTAAAAACAA 
TTATTCTTGA 
TTATTCTTGA 
ATTGCCCACC 
ATTGCCCACC 



350 

TGTATAATTT 
TGTATAATTT 
AAATTGAATT 
AATATGAGTT 
TGTATCCCTT 

TGTATCCCTT 
* ** 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



351 

TGATATCGTT 
TGATATCGTT 
AGATTATATT 
AGATTATATT 
GGCTTGGATT 
GGCTTGGATT 
★* 



GTAGATATTG 
GTAGATATTG 
TTGAATATCA 
TTGAATATCA 
GAAGATGTAG 
GAAGATGTAG 




AAAGGTTACA 
AAAGGTTACA 
GTGAATTAGA 
GTGAATTAGA 
ATGGACTTGC 
ATGGACTTGC 
* 



AAATGTAGTC 
AAATGTAGTC 
GAATATTATT 
GAATATTATT 
GGAAGCGTT. 
GGAAGCGTT . 
+ *_ 



400 

TATGAGGGGA 
TATGAGGGAA 
AAGGAAAAAA 
AAGGAAAAAA 
. . GAAAAGGA 
. . GAAAAGGA 
* * 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



401 450 

CGATGAATCG TCCGTTTTTA GAAACTAACA GAAGTAATTT TATT 

TGATGAATCG TCCGTTTTTA GAAACTAATA GTAGTAATTT TATT 

ATATATCTAC TAGTAAAGTA AT AT CACAAA ACAATGATTT TTGTTTCTCT 
ATATATCTAC TAGTAAAGTA AT AT CACAAA ACAATGATTT TTGTTCCTCT 

ATATAGCTAC AGAAAAATAT CAGGGAAATA ATGATATGTT TTGT 

ATATAGCTAC AGAAAAATAT CAGGGAAATA ATGATATGTT TTGT 

** *„ *-* * +* * — ★ 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



451 500 

GAAGAA TTTAAGGTAA TATTAAAGGA 

GAAGAA TTTAAGGTAA TATTAAAGGA 

TTCAAAAATG AACATTTCAT AAACTATTTG AATAAATATA TTTTGTTGGA 
TTCAAAAATG AAC . . TTTCT AAACTATTTG AATAAATATA TTTTGTTGGA 

CATA AATTAGAAAA AATTATAGGT 

CATA AATTAGAAAA AATTATAGGT 

*_* * ★* **_ 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



501 550 

GTTGTGTGAT GAAA ATCAATAAA AACTCTTTAT TTTATATTGC 

GTTGTGCGAT GAAA ATCAATAAA AACTCTTTAT TTTATATTGC 

GAAAAAAATT GAAATTAACA TATCAATCCA AAGTATTTGT TAATAGGAGG 
GAAAAAAATT GAAATTAAC. TATCAATCCA AAGTATTTGT TAATAGGAGG 
GAAATATOAG GAAAT.-...A TCTAGATTTA GATTATTCTT TATTTTATGC 

GAAATATGAG GAAAT A TCTAGATTTA GATTATTCTT TATTTTATGC 

* **** — * — *+ — * — * * *_ 
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Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



551 

AATATTTTTA 
AATATTTTTA 
AATTTTCGCT 
AATTTTCGCT 
TCTTTGGGTA 
TCTTTGGGTA 



GTTAATTTTT 
GTTAATTTTT 
TTAACCCTAT 
TTAACCCTAT 
CTTATTTTAG 
CTTATTTTAG 
+ 



TTAAATCACT 
TTAAATCACT 
TTTCAAAGCC 
TTTCAAAGCC 
TACCAAACCA 
TACCAAACCA 
* * 



AGGTTTAGGA 
GGGTTTAGGC 
AATGCAACTT 
AATGCAACTT 
ATGGTATCAG 
ATGGTATCAG 



600 

GAGGGGAACT 
GAGGGAAACT 
TTGTTACTTT 
TTGTTACTTT 
TTTTTAATTA 
TTTTTAATTA 



601 

CAACTTACAA 

CAGCTTACAA 

TAGCATTAAT 

TAGCATTAAT 

TTACCATTAT 

TTACCATTAT 
*_ +_ 



AATAGTGATG 
AATAGTGATG 
AGTTTTACTT 
AGTTTTACTT 
AGTTCTATTA 
AGTTCTATTA 
* *_ 



TTTGTTGCAA 
TTAGTTGCAA 
ATTTGTAGTA 
ATTTGTAGTA 
TTACTTTGGA 

TTACTTTGGA 



cpsH 

TCTTCTTGTG 
TTTTACTGTG 
GTTATAAGAA 
GTTATAATGA 
AGAGTGAGTT 
AGAGTGAGTT 



650 

TGGAATAAAA 
TGGAATAAAA 
AAAAATGAAA 
AAAAATGAAA 
TAGAAT. . .A 
TAGAAT. . .A 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



651 

TTTTTA. . . . 

TTTTTA. . . . 

TTTTTATATA 

TTTTTAAATA 

TCTATAAGCA 

TCTATAAGCA 




. . TTAGATAG 
• . TTAGATAG 
TGGCTGAAAT 
TGGCTGAAAT 
ATTCTTCAAT 
ATTCTTCAAT 




CCTTTATTTT 
CCTTTATTTT 
TTTTTTCATT 
TTTTTTCATT 
ACTATTTCTG 
ACTATTTCTG 
— *_ 



GAAAGAAGAA 
GAAAGAAGAA 
GTATTTTATA 
GTATTTTATA 
CTTTGGTTAT 
CTTTGGTTAT 



700 

AACTCGTTAT 
AACTCGTGAT 
TCATTTATTT 
TGGTTTATTT 
TTATTTATTT 
TTATTTATTT 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



701 

CATCTTTTTA 
CATCTTTTTA 
AACTTCAATA 
AGTATCAATA 
ATTTGCAATA 
ATTTGCAATA 
** 



TTATTTATTG 
TTATTTATCG 
TTGCTACATT 
GTATTAAATT 
CTCATTAGAG 
CTCATTAGAG 




CGACCATTTT 
CGACCATTTT 
CTTTGTTTAA 
CGTTATTTAG 
GTACTCAAGA 
GTACTCAAGA 



GAATTTATTC 
GAATTTATTC 
AACTCCTGAT 
AAGTCCAGAA 
GGATATAACG 
GGATATAACG 



750 

TTTGTTCATA 

TTTGTTCATA 

TTTGATAGAA 

TTTCATAGAG 

TTTCAGCGAT 

TTTCAGCGAT 
*** 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



751 800 

AGGTTACTTT TATATTAA C TTTAATTTTT 

AGGTTACTTT TATATTAA. C TTTAATTTTT 

TTTTAGCAGC TTTTAACTCG TTGATTATCG GTATAGTATC AGTGGCTTTG 

TCATTGCTGC ATTCAATTCA CTGGCAGTAG GGGTTGTGTC CTTATTATTT 

TTATTGCTGA GCTATTAAAA CTAATTAGTA CAGGATATGC TTTATTTTTT 

TTATTGCTGA GCTATTAAAA CTAATTAGTA CAGGATATGC TTTATTTTTT 
* — * — + * + **_ 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
. Consensus 



801 

TTTCTAGCAT 
TTTCTAGCAT 
AAACGGTGGT 
TACCATTACT 
TATAATTATT 
TATAATTATT 



TAAAGGATAT 
TAAAGGACAT 
ATAAGAATAC 
ATAAGAATAC 
ATAGAAAAGC 
ATAGAAAAGC 



CTCTCTAAAA 
CTCTCTAAAA 
AACTTTGGAG 
TAATATTGAA 
TGATTTTAAT 
TGATTTTAAT 
*_* *- 



AAAGCTTTCT 
AAAGCTTTCT 
TTAGATAAAA 
TTAACAAAAT 
AGTTCAGTTG 
AGTTCAGTTG 



850 

CTATAATAAT 
CTATAATAAT 
TATTAAAAGC 
TGCTAAAATC 
TAAGGAATGT 
TAAGGAATGT 
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Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



851 

AGGATCGCGT 
AGGATCGCGT 
ATTTTTATTT 
ATTTTTGTTT 
GGTAAAGGTT 
GGTAAAGGTT 



ATTTTGGGAG 
ATTTTGGGAG 
AATGGGTTAA 
AATGCAATTA 
AACTATTTTG 
AACTATTTTG 



TTCTATTAAA 
TTCTATTAAA 
TCCTATTTTT 
TTTTGTTTTG 
TGTTGTTTCT 
TGTTGTTTCT 
* *_++ '. 



TCAAATTTTT 
TCAAATTTTT 
TTTAGGGGGA 
TTTAGGATTT 
TATAACAGTT 
TATAACAGTT 



900 

GTGAAATTAG 
GTGAAATTAG 
ACATATTATT 
CTATATTATT 
TTATATT . . . 
TTATATT . . . 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



901 

ATTTAATAGA 
ATTTAATAGA 
ATTGTTTGCA 
ATGCCATATA 

TATT 

TATT 

* 



AATTAAATAT 
AATTAAGTAT 
TAATAATATT 
TTTTGATGTA 
TTTTCCTATG 
TTTTCCAAAT 
* 



ATCAATTTTT 
GTCAATTTTT 
CAAAATATCA 
GAGAATGTAA 
CTGAAGCCAA 
GAATTTACTA 



ATAGGGATGG 
ATAGGGATGG 
GTATTTTTGG 
GTCTTTTTGG 
CTTTATTTGG 
CATTCCTAGG 
** 



950 

ACAATTTATT 
ACAATTTATT 
TAGAGATTTG 
AAGAAATTTA 
AAGAGAATTG 
AAGAGATTTA 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



951 1000 

CTGAGAAGTG ACT TAGG 

CTGAGAAGTG ACT TAGG 

ATTGGGTCAG ACTGGATTAA TGGTATGCAT ACTCAAAGAG CAATGGGATT 
ATTGGATCAG ATTGGATAAA TGGGATGCAT ACGCAGAGAG CAATGGCTTT 
TTTTCAATAG AGTGGTTTCC ACATATG. • . AGAATAAGAC TTGCGGCATA 
TTTTCAATTG AATGGATTCC TTCTATG . . . AAAGTTAGAC TTACTGCATA 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



1001 

TTTTGGTCAT 
TTTTGGTCAT 
TTTTGAATAT 
CTTTGAATAT 
TTTTGAATAT 
TTTTGAGTAT 
_***+ +* 



CCTAACTTTA 

CCTAACTTTA 

TCAAACCTTA 

TCAAATCTTA 

GCTACACTAA 

GCAACACTAT 
__*_★ * 



TTCATAATTT 
TTCATAATTT 
TAATTCCTAT 
TAATACCCTT 
TTGGTCAGTT 
TAGGTCAGTT 



TTTTGCAGTA 
TTTTGCTCTA 
GACAGTGGTA 
AACTATCATA 
TATTTTATTT 
TATTTTATTC 



1050 
ACTGTTTTTT 
ACTATTTTCT 
ACTAACTATA 
ACTAA.TATA 
TCTTATCCCA 
ACTTATCCGA 



1051 1100 

Serotype IV TATATGTAAC ACTTTTTTAT AGAAAACTAA GAT.TAATAA CTATTGCTTT 

Serotype V TGTATATTGT ACTCAATTAT AAACGACTAA AGC.CTGTTG TGATGGTTTT 

Serotype la TATATATATA T.TATATGAA GTTAAGAAAC TATTCAATTA TGACCATAGG 

Serotype lb TATATATATA TATATATTAA GCAAAGATAT AGCTCAGGGA TGATGATACT 

Serotype III TAC, TTTTTTTGAA ACCCCAAAAA CATATGGAAA ATATTTTAAT 

Serotype VI TAT TATTTTTAAA ACAGCAGAGG TATGGAGAAA ATATTTTTAT 

Consensus * — * ■ — * 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



1101 

TATTTTAACT 
ATTTTTAACA 
TGTTGTATTA 
CGGTGCTCTT 
ATCCTTACTG 
CACACTATTC 



CTAAATTACT 
TTAAATTATT 
TTATTTACCT 
CTCTCCACTA 
TTGACTATAT 
CTAGTTTTTT 



TCTTGTATCA 
TATTGTACCA 
TTATTTTACC 
TTATACTACC 
GTTCATACTT 
GTGCATATTT 



GTATACTTAT 
ATATACTTTT 
TATTGGATCG 
CATCGGGTCT 
TTCTGGCGCT 
GACAGGGGCA 



1150 
TCAAGAACTG 
TCAAGGACAG 
GGCTCCAGGG 
GGATCTAGAG 
AGAATACTAT 
AGAATTTTCC 
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Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



1151 

GATATTATAT 

GGTATTATAT 

CTGGAATAGT 

CTGGTATTAT 

TGGTCTGTAT 

TAATTTGTAT 
* 



AGTACTCTTA 

CGTAATTTTA 

AGCTATATTG 

AGTTGTGCTA 

GTTGGTTTTA 

GATAATTTTA 
+ +_ 



TTTATACTTA 

TTTATTGTAC 
GCGCAGATGT 
CTACAGGTTA 
TTAGCATCGC 
TTAGGTTATT 



TTATATATGT 
TCATTTATGT 
TTATTCTTCT 
TAATTTTATT 
TTCTTTTAGA 
TACTCTTAGA 



1200 
TACAAAGAAT 
GACAAAGAAT 
TCTAAATACA 
GTTGAATACA 
TTATATCCTT 
AATAATCATT 



1201 

Serotype IV AACCTGATAA GGAAAATTTT 

Serotype V AGCTTAATAA AAAGAGTATT 

Serotype la GTTGTCGTAA AGAAGAAAAC 

Serotype lb ATTGTAATAA AAAGACAAAC 

Serotype III TTTAAAACTA ATTTGAAATT 

Serotype VI AATAAATTTA ACCTAAAAAT 

Consensus * 

1251 

Serotype IV CTTGTTAGCA TTTACTTTTC 

Serotype V TTTATTAGTA TTTACCTTTT 

Serotype la TCTACTAGTA ATAGTAATGA 

Serotype lb ACTAATATTA CTATTAGTGA 

Serotype III GACTTTCTTA TTTATCACCG 

Serotype VI AGGGATAATA TTATTATTGG 

Consensus * * -* 



1250 

TATGATAGTT GCTCCGTACA TACAACTGTT 

TATGAAATTA GCACCCTATG TACAATTTTT 

TATAAAATTT TTATTGTACA TACTTCCGTT 

GATAAGATTT TTCCTGTATT TAGTTCCGAT 

GACCAAGAAA AACACTTTTA TACTTGGTAT 

TACTAAAAAA GCTGTCTTTT TGATAATTAT 
.* — * + * * 

1300 

TTTGCTCTAC TATTTTTTTC AACTCAAATT 
TGAGTTCTAC AATTTTTTTT AATTCAAATT 
TGTTATATTT TGATAACTTA CTATCTATAT 
TATTACGTTT TGATAATTTG GTGAGCATAT 
CTTGTTTTTC TTATAACATA TGGTCAATAA 
TATGTTTTTC TTACAAAGTG GAGTCTATTA 



1301 

Serotype IV TTGTTCAAAA ATTAGATAGC 

Serotype V TTGTTCAAAA ATTAGATGTT 

Serotype la ATTATCGTAT AATTAATTTG 

Serotype lb ATAATAGAAT AATCAATTTG 

Serotype III TTGAAAAAAT AATTATGTAC 

Serotype VI TCAATTATAT AATACACTAT 

Consensus *- *-* 



1350 

CTTTTGACAG GTAG 

CTTTTAACAG GTAG 

CGATCCGGGA GTAGTGAATC CAGATTTTCT 
CGGTCGGGAA GTAGTGAATC TAGATTTTCT 
AGAAACCAAA GTACTATCAC TAGGATGATA 
AGATTTCAAA GTAGTAGTAC AAGATTGACA 
+** . 



1351 1400 

Serotype IV . . GTTAAACT ATGCTCATTT ACAGCTTGTA GACGGCTTAA CTCTTTTTGG 

Serotype V . .ATTACACT ATGCTCATTT ACAACTTGTA GATGGTTTAA CTCCTTTTGG 

Serotype la GTATATAAAG ATACAGTAAA CATCGTTATA AATAATTCTT TATTATTTGG 

Serotype lb TTGTACAAGG ATACCGTACA CTCAGTAATT ACTGACTCAC TATTTCTGGG 

Serotype III GTTTATCAAG AAAGTATTAT TGAAGTTCTA AAAGGAAATA TTTTATTTGG 

Serotype VI GTCTATTACG AAAGTATAAG AGCGATTTTA GATGGGAATT TCCTTATTGG 

Consensus * * — * + — *- 

1401 1450 

Serotype IV AAATAGTTTT AAGGAG A CGAGTGTCCT 

Serotype V AAATAGTTTT AAGGAA A CAAGTGTCCT 

Serotype la AGAAGGAGTT AAAGAGTTAT GGTTAAATAG TGATCTACCT TTGGGGTCGC 

Serotype lb AAAAGGTGTA AAAGAATTGT GGTTAAATAG TGATTTACCA CTAGGATCGC 

Serotype III ACAGGGTATA AGGA. . .TTC CATCAAGTGA AGGAATATTC CTAGGATCGC 

Serotype VI GCAAGGTATA AGAG . . . TTC CCTCCAGTGT GGGAATATTT TTAGGTTCAC 

Consensus — * — * — *- * * — ++ — 
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1451 

Serotype IV ATTTGATAAT 

Serotype V ATTTGATAAT 

Serotype la ATTCAACGTA 

Serotype lb ATTCGACCTA 

Serotype III ATTCTACTTA 

Serotype VI ATTCATCATA 

Consensus *** 



AGCTACTCTA TGTTATTGAG 
AGCTACTCTA TGTTATTGAG 
TATAGGCTAT TTCTACAAAA 
CATAGGTTAT TTCTATAAAA 
TATTAGTGTC TTTTACAGGA 
CATTAGTATA TTTTATAGAA 



1500 

TATGTATGGT GTAGTACTTA 
TATGTATGGT GTAGTACTTA 
GTGGCCTGCT GGGATTAATG 
CTGGCCTATT TGGACTAATA 
CTTCTTTATT AGGAATTGTT 
CTTCTTTTAC GGGGCTGTTT 



1501 1550 

Serotype IV CCATGTTTTG TATGATAATC TATTATATCT ATAGTAAAAA AGTCAATGTA 

Serotype V CCATGTTTTG TATGATAATC TATTATATCT ATAGTAAAAA GATAATCATA 

Serotype la AATATAGTTC CAGGTTTGCT TTTAAT.TTT TACTAATATT GGTAGGAAAG 

Serotype lb AATGTGATTT TAGGTTTGTT TCTAAT . TCT TATTAGCATT ATCAAGGAAG 

Serotype III CTTTATTTTT CTGCCTTTAT ACTTTTATAT AAAGAAGCGA TTTCAAAAAA 

Serotype VI CTTTTCTTTT CAATATTACT TTTTCTATAT AGAGAAGCTA TCAAACAAAA 

Consensus ** * * . 

1551 1600 

Serotype IV GTTGAGCTCC AGATACTTTT GTTTA TA 

Serotype V ATTGAACTTC AACTACTCCT ATTTA . . . ] . ] ] [tA 

Serotype la CTAAACAATC AGCTTTTTAT TATGAGATAG TAGGAACACT TATAACTTTA 

Serotype lb CTAAAAAGTC AGATTTCTAT TATGAGATAG TAGGGTCTGT CATACTCCTA 

Serotype III TTATAAAATC TACAGATTAT TTT T TTATACGTTA 

Serotype VI CAGGATAATC TACAAGCTTT TTT T TGGATTGTTA 

Consensus * * * * 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



1601 

ATGTCTATAG 

ATGTCTATAA 

TTCTCATTTT 

TTTTCATTTT 

TTATGTTACA 

TTATTGTATA 




TATTATTTAC 
TATTATTTAC 
TTGCACTTGA 
TTGCACTTGA 
CGCTCTTTGA 
TGGTATTTGA 



AGAGAGTTTT 
TGAAAGTTTT 
AGATCTTGAC 
AGATATTGAT 
GGAAATAGAT 
AGAATTTGAT 



TACCCAAGTA 
TATCCCAGTG 
GGAGCTAATT 
GGCGCCAATT 
CCTAATCATT 
CCTAATCATT 



1650 
TAGTTATGAA 
TGGTAATGAA 
GGCTTATTGT 
GGCTCATTAT 
GGAGTATTGT 
GGAGTGTTGT 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



1651 

TATTAGTTGG 
TATTAGTTGG 
TTTTATTTTT 
TTTTGTCTTT 
ATTATTATTC 
ATTGTTATTT 



ATGGTTTTTG 

CTAGTTTTTG 

ACAGTGTTAG 

ACAGTGTTGG 

TCAACTTTTG 

ACTACATTAG 




GGAAAATATT 
GTAAAATATT 
GAATTTTAGA 
GAATTTTAGA 
GTATAGTGGG 
GTATAGTAGG 



TTGTGGGGGT 
TTGTGATGGT 
AAATAAGGAT 
AAATAAGGAT 
AAGGGCTAA. 
GAGAG.GGA. 



1700 
GTAGATGATT 
ATCGAACCTA 
TTTTATAGTC 
TTCTATAGTC 



1701 

Serotype IV TACAAC GAGAGTT 

Serotype V TAAAAA AGGAATT 

Serotype la AACTTAAAAG GTGGAAAAGT 

Serotype lb AACTTAAAAG GTGGGAAAGT 

Serotype III 

Serotype VI 

Consensus 



1750 

CACTTGGACG GCAAATAAAA ATTAGTGTAA 
TACT... ATT GTGAATAATA TATGACATAT 
TAATGGAAAA ACGAATACTT GTTTCTATCA 
TAATGGAAAA ACAAATACTT GTTTCTATCG 

AAAAT GAAAGAAAAA GTAACAGTCA 

ATGAT AAAAAAACTA GTTAGTGTGA 
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Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



1751 

TTGTACCAGT 

TTGCTCTGAT 

TTATACCTAT 

TTATACCTAT 

TTATACCTAT 

TTGTTCCAGT 
** * * 



ATATAATTCG 
ATGGCAGGAG 
ATACAACTCA 
ATACAACTCG 
ATACAACTCA 
TTATAATTCG 



AAACAATATT 
GTAAGGAAGG 
GAAGCATACC 
GAAGCATATC 
GAAGCATACC 
GAGTTAGTGA 



TAATAGCTTG 
AAAATGATAC 
TTAAAGAATG 
TTAAAGAATG 
TTAAAGAATG 
TTGAGAACTG 



1800 
CGTTGATTCA 
CTAAAGTTAT 
TGTGCAATCC 
CGTGCAATCC 
TGTGCAATCC 
TGTAGAATCT 



1801 

ATTAGAAAAC 
ACATTATTGT 
GTACTACAAC 
GTCCTACAAC 
GTACTACAAC 
TTGCTTCAAC 



AAACATATAA 
TGGTTTGGAG 
AGACTCATCC 
AGACTCATTC 
AGACTCATCC 
AAACATACCC 



GAATTTGGAA 
GAAATCCCTT 
ATTGATAGAA 
ATTGATAGAA 
ATTGATAGAA 
AGAAATAGAA 



cpsl/M 

ATTATTCTTG 
ACCAGATAAT 
GTTATACTAA 
GTTATACTGA 
GTTATACTAA 
ATTTTATTAA 



1850 
TTAATGATGG 
TTAAAGAAAT 
TTGATGATGG 
TTAATGATGG 
TTGATGATGG 
TAGATGATGG 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



1851 

ATCAACAGAT 
ATATAAAAA. 
ATCCACTGAT 
ATCCACTGAT 
ATCCACTGAT 

ATCTACAGAT 
** — * *_ 



GGTAGTAAAG 
. . . CTTGGAG 
AATAGTGGAG 
AATAGTGGAG 
AATAGTGGAG 
AAAAGTAGTC 



AGTTATGTGA 
AGAACAATGT 
AAATTTGTGA 
AAATTTGTGA 
AAATTTGTGA 
ATATTTGTAA 



GGAGATAAGA 
CCGGATTATG 
TAATTTATCT 
TAATTTATCT 
TAATTTATCT 
TAATTTTTTA 



1900 
AAATCAGATG 
AAATTATTGA 
CAAGAAGATA 
CAAAAAGACG 
CAGGAAGATA 
AAAAGGGATA 



1901 1950 

Serotype IV AAAGAATTAA GACATTTCAC AAAACAAATG GAGGACAATC AAGCGCAAGG 

Serotype V ATGGAATGAG CATAATTATG ATGTTAGTAA AAATGTTTTT ATGAGAGAAG. 

Serotype la ATCGCATACT TGTATTTCAT AAAAAAAATG GAGGGGTCTC TTCGGCAAGG 

Serotype lb ATCGCATACT TGTATTTCAT AAAAAAAATG GAGGGGTATC TTCGGCAAGG 

Serotype III ATCGCATACT TGTATTTCAT AAAAAAAATG GAGGGGTCTC TTCGGCAAGG 

Serotype VI GTCGCGTAAA AGTCTATCAT AAATACAATG GAGGTGCATC ATCAGCAAGA 

Consensus * — * * * ■ -* *- * — * — 

1951 2000 

Serotype IV AATTTAGGTA TTTTATACTC . TACAGGAGAT TTGATTGGTT TTGTTGACAG 

Serotype V CATATACTAA GAAGAATTT TGCT TATGTTTCTG ACTATGCAAG 

Serotype la AACCTAGGTC TAGATAAATC CACAGGAGAA TTCATAACAT TTGTGGATAG 

Serotype lb AACCTAGGTC TTGATAAATC CACAGGCGAA TTCATAACGT TTGTAGATAG 

Serotype III AACCTAGGTC TAGATAAATC CACAGGAGAA TTCATAACAT TTGTGGATAG 

Serotype VI AATGTGGGAC TTGAGATGGC AGAAGGTGAA TTTATAACTT TTGTAGATAG 

Consensus -* — * * — * *- * — ** 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



2001 

CGACGATACA 
ATTGGATATT 
TGATGATTTT 
TGATGATTTT 
TGATGATTTT 
CGATGATGTT 
*★* 



ATTGACCCTA 
ATTTATACTT 
GTAGCACCGA 
GTAGCACCGA 
GTAGCACCGA 
GTCGCACTAA 



AAATGTATGA 
ATGGGGGGTT 
ATATGATTGA 
ATATAATTGA 
ATATGATTGA 
ATATGATTGA 



AACGTTACTA 
CTATCTAGAT 
AATAATGTTA 
AATAATGTTA 
AATAATGTTA 
AATTATGCTG 



2050 
AATATATATG 
ACTGATGTGG 
AAAAATTTAA 
AAAAATTTAA 
AAAAATTTAA 
AATAATTTGT 
* 
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Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



2051 

AAGATGAACA 
AGCTTTTAAA 
TCACTGAGAA 
TCACTGAGGA 
TCACTGAGAA 
TAACGGAGAA 
* 



AGTAGACTGG 
AAGTTTAGAT 
TGCTGATATA 
TGCTGATATA 
TGCTGATATA 
CGCAGATATA 



GTGCAATGTA 
CCTTTGAGGA 
GCAGAAGTAG 
GCAGAAGTAG 
GCAGAAGTAG 
TCAGAAATTG 



ATCACAAAAA 
TTCATGAGTG 
ATTTTGA. . . 
ATTTTGA. . . 
ATTTTGA. . . 
ATTTCGA. . . 



2100 
AATTTACTCT 
TTTTCTAGCA 
TATTTCGAAT 
TATTTCGAAT 
TATTTCGAAT 
AGTTTCAGAT 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



2101 

AACGGTGTTA 
AGGGAGATTA 
GAGAGAGATT 
GAGAGAGATT 
GAGAGAGATT 
GA. • . TTTTT 



ACTTATATTA 
GTTGTGATGT 
ATAGAAAGAA 
ATAGAAAGAA 
ATAGAAAGAA 
ATAAAAGAAA 



TAATGGACCT 
GAATACAGGA 
GAAAAGACGA 
AAAAAGACGA 
GAAAAGACGA 
AAAAAGAAAA 



GAATACTATA 
TTAATAATTG 
AACTTTTATA 
AACTTTTATA 
AACTTTTATA 
GGTTACTATA 



2150 
ATGTGCTTAA 
GCGCTGTTAA 
AAGTCTTTAA 
AGGTCTTTAA 
AAGTTTTTAA 
GAGTTTTTCA 



_**_* 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



2151 

TAAACAAGAT 
AGGACATCAC 
AAACAATAAC 
AAACAATAAT 
GAATAATAAC 
AAACAATAAG 



TTCCTATACG 
TTTTTAAAAT 
TCTTTAAAAG 
TCTTTAAAAG 
TCTTTGAAAG 
TCTCTCAAAG 



AATTTCTGAG 
CAAATATGTC 
AATTTTTATC 
AATTTTTATC 
AATTTTTATC 
AATTTTTTTC 



TACAAATAAG 
TATATATGAC 
AGGCAATAGA 
AGGTAATAGA 
AGGTAATAGA 
AGGAAATAAA 



2200 
ATTTTTAGTT 
AAAAGTGATT 
GTGGAAAATA 
GTGGAAAATA 
GTGGAAAATA 
GTAGAAAATG 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



2201 

CAGTCTGCGA 
TAACTTCTCT 
TTGTTTGTAC 
TTGTTTGTAC 
TTGTTTGTAC 
TTGTTTGGGG 



GGGGTTGTTA 
TAATAAGACA 
AAAATTATAT 
AAAATTATAT 
AAAATTATAT 
GAAATTATAT 



TCTAGAGATT 
TGTGTAGAGG 
AAAAAAAGTA 
AAAAAAAGTA 
AAAAAAAGTA 
AAAAAAAGCA 



TAGCTTTAAA 
TTACAACTAA 
TAATTGGCAA 
TAATTGGTAA 
TAATTGGTAA 
TTATTGGGGA 



2250 
AATAAAATTC 
TTTATTGATA 
CTTGAGGTTT 
CTTGAGGTTT 
CTTGAGGTTT 
TTTACGATTT 
— * 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



2251 

CGTGAAGAAA 
AACAGAGGGC 
GATGAGAACT 
GATGAGAATT 
GATGAGAACT 
AATGAAAAAT 



AAAAAT. • .A 

TTAAGA. . .A 

TAAAAATTGG 

TAAAAATTGG 

TAAAAATTGG 

ACAAAATTGG 
*+ 



TGAAGATACA 
TAAGAATATT 
TGAGGATTTA 
TGAGGATTTA 
TGAGGATTTA 
TGAAGACTTG 



CAGTTTTATT 
ATTCAAAAGA 
CTTTTTAATT 
CTTTTTAATT 
CTTTTTAATT 
CTATTTAACT 



2300 
TTGATCTCAT 
TTGA. . TGAT 
GTAAACTCTT 
GTAAAATTTT 
GCAAACTCTT 
TTCAGATTTT 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



2301 

AAAAAATGCT 
ATAACAATAT 
ATGTCAAGAG 
ATGTCAAGAG 
ATGTCAAGAG 
AAATAAAGAA 



AATAAGTTTG 
ATCCGAGAAA 
CACCGTATAG 
CACTGCATAG 
CACCGTATAG 
CATCGTATAG 



TTATTATAAG 
TTATTTTAAT 
TCGTAGATAC 
TCGTAGATAC 
TCGTAGATAC 
TTGTAGATAC 
* * *_ 



CCAACCTTTT 
CCAAAGAATT 
GACTTCTTCC 
GACTTCTTCC 
GACTTCTTCC 
TAGAAGATCA 



2350 
TATAATTACT 
TATTAACA. . 
TTATATACTT 
TTGTACACCT 
TTATATACTT 
CTCTATACTT 
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Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb. 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 
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2351 2400 

ACTACAGAAA AAATAGTACA ACAACTTCCT CATATAGTAG CTATCAATGG 

. . GGTAAGGT TGATTGTCTG ACTAGTGTTA CCTATTCTAT ACATCATTAC 

ATCGAATTGT AAAAACTTCC GCAA.TGAAT CAGAAATTCA ACGAAAACTC 

ATCGCATCGT AAAGACTTCT GCAA.TGAAT CAGGAGTTCA ACGAAAATTC 

ATCGAATTGT AAAAACTTCT GTAA.TGAAT CAGAAATTCA ACGAAAACTC 

ATCGTATTGA AGAAAAATCT ATAA.TGAAT CAACAATTTA ATAAAAATAC 
* — * +_+ * * * 

2401 2450 
GACATAATCG ATATCTGTAC TGAGTGTTAT TATTATGCAA AGGATTTTAA 
GAAGGAAGTT GGAAAAGTTC TTCATTTATT TCAGATTCTC TAAAGATTAG 
ATTAGATTTT ATAACAATTT TTAATGAAGT AAGTAGTTTG GTTCCTGCCA 
ATTAGATTTT ATAACAATTT TTAATGAAAT AAGCAGTATT GTTCCTGCAA 
ATTAGATTTT ATAACAATTT TTAATGAAAT AAGTAGTTTG GTTCCTGCCA 
ATTAGACTTC ATTGATATTT TTAATGAGAT TCATCAGGAT AGTCCGACAG 



2451 2500 
TGGATTTGAA GAAGTTGCTT TTTCAAGATT ATTTGGTGCA TATTCGTTAG 
AGTAAGGCTC ATAATTGATT TTTTATTTGG ATATGGTACT TATAGAATGC 
AATTGGCTAA TTATGTTGAA GCGAAATTTT TAAGAGAAAA GATAAAGTGT 
AATTAGCTAA TTATGTTGAA GCGAAATTTT TAAGAGAAAA GGTAAAGTGT 
GATTAGCTAA TTATGTTGAA GCGAAATTTT TAAGAGAAAA GATAAAGTGT 
AATTGTTTAA TTATGTGGAA GCGAAGTTTG TACGAGAAAA AATCAAGTGT 
— * — * + * * * 

2501 2550 
TAGCTAATAA AATTGTATAT AATAAAGATT ATAGAAAAAC CGAAGAATTA 

TTCTAAGGTT TCTAAAGTTA AAGAAATAG 

CTCCGAAAAA TGTTTGAATT AGGTAGTAAT ATTGACAATA AAATCAAAGT 
CTCCGAAAAA TGTTTGAATT AGGTAGTAAT ATTGACAGTA AAATCAAATT 
CTCCGAAAAA TGTTTGAATT AGGTAGTAAT ATTGACAATA AAATCAAAGT 
TTAAGGAAAA TGTTTGAATT AGGAGAAATA GCTGATGAAA ATTTACGTTT 
*_ * + 

2551 2600 
AGATAA 

ACAACGAGAG ATTTTTTTCA AAGACATTAA ATCATACCCG TTCTATAAAG 
ACAACGAGAG ATTTTTTTCA AAGATGTTAA ATTATACCCT TTCTATAAAG 
ACAACGAGAG ATTTTTTTCA AAGACATTAA ATCATACCCG TTCTATAAAG 
ACAGAGATAT AAATTTTGGC AAGATATTAA AT CATATTCA ATATGCAAAG 



2601 2650 



CGGTAAAATA CTTATCATTA AAGGGATTAT TAAGCTTTTA TTTAATGAAA 
CGGTTAAGTA CTTATCATTA AAGGGATTAT TGAGTATTTA CTTAATGAAA 
CGGTCAAATA CTTATCATTA AAGGGATTAT TAAGCTTTTA TTTAATGAAA 
CAATAAGGTT CTTATCTAAA AAACATATCT GTACGTTATA TTTGATGAAA 
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2651 



2700 



Serotype IV 

Serotype V 

Serotype la TGTTCACCTA AACTATATGT TATGGCATAT AGAAGATTCA AAACAGTAGC 

Serotype lb TGTTCACCGA TCTTGTATAT AAAATTATAT GACAGGTTTC AAAAACAGTA 

Serotype III TGTTCACCTA AACTATATGT TATGGCATAT AGAAGATTTC AAAAACAGTA 

Serotype VI TATTTTCCGT ACGTATATAT AAAGATGTAT AATAAATTTC AAAAGCAATA 

Consensus 

2701 2728 

Serotype IV 

Serotype V 

Serotype la TGGAGAAATT GGGAAAGAGA ATTTATAA 

Serotype lb A 

Serotype III 6 

Serotype VI A 

Consensus 



Notes. 

Serotype la: GenBank accession number AB028896; 
Serotype lb: GenBank accession number AB050723; 
Serotype HE: GenBank accession number AF163833; 
Serotype IV: GenBank accession number AF355776; 
Serotype V: GenBank accession number AF349539; 
Serotype VI: GenBank accession number AF337958. 
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Figure 4. Two sites (*) of sequence heterogeneity between alp2 (AF208158, upper lines) and 
alp3 (AF291065, lower lines) used to distinguish them (relevant primers are shown), 

251 AAGGTAATCTTAATATTTTTGAAGAGTCAATAGTTGCTGCATCTACAATT 300 

I I M M I M I I I I I I I I I II I I I I I I I 1 I 1 I I I I ! I I I I I I I | I | I | I | | 
531 A AGGTAATCTTAATATTTTTGAAGAGTCAATAGTTGCTGCATC TAGAATT 580 

bcaSl . 

301 CCAGGGAGTGCAGCGACCTTAAATACAAGCATCACTAAAAATATACAAAA 350 
III HUM II III INI MM I II I II I I II II II II I Mill I I III I 

581 CCAGGGAGTGCAGCGACCTTAAATACAAGCATCA CTAAAAATATACAAAA 630 

• bcaS2 • . 

351 CGGAAACGCTTACATAGATTTATATGATGTAAAGAATGGATTGATTGATC 400 

1 1 1 1 1 1 * M M I M 1 1 M H H I H H M 1 1 1 1 1 I M 1 1 1 1 1 1 1 I * II 1 1 

631 CGGAAATGCTTACATAGATTTATATGATGTAAAGAATGGATTGAT CGATC 680 
401 CTC7WACCTCATTGTATTA7VATCCATCAAGCTATTCAGCAAATTATTAT 450 

I M 1 1 II II II I M 1 1 1 1 1 1 II I ! II 1 1 1 II I II I II I II 1 1 II 1 1 II 1 1 

681 CTCAAAACCTCATTGTATTAAATCCATCAAGCTATTCA GCAAATTATTAT 730 

balS . • 

451 ATCAAACAAGGTGCTAAATATTATAGTAATCCGAGTGAAATTACAACAAC 500 

M M I M M II 1 1 1 1 II II 1 1 1 1 1 M 1 1 M II 1 1 M M M 1 1 II I M I M 

731 AT CAAACAAGGTGCTAAATATTATAGTAATCCGAGT GAAATTACAACAAC 780 

• i • • • 

501 TGGTTCAGCAACTATTACTTTTAATATACTTGATGAAACTGGAAATCCAC 550 

I II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 

781 TGGTTCAGCAACTATTACTTTTAATATACTTGATGAAACTGGAAATCCAC 830 
551 ATAAAAAAGCTGATGGACAAATTGATATAGTTAGTGTGAATTTAACTATA 600 

II MM MM I MM II HIM III I MM MM 1 1 1 1 1 1 1 i 1 1 1 1 II 1 1 

831 ATAAAAAAGCTGATGGACAAATTGATATAGTTAGTGTGAATTTAACTATA 880 
601 TATGATTCTACAGCTTTAAGAAATAGGATAGATGAAGTAATAAATAATGC 650 

III III II HIM HIM III I MM II II Ml I HIM II II III II II 

881 TATGATTCTACAGCTTTAAGAAATAGGATAGATGAAGTAATAAATAATGC 930 

• § • • • 

651 AAATGATCCTAAGTGGAGTGATGGGAGTCGTGATGAAGTCTTAACTGGAT 700 

II I II II II I I I I I I I II I I II I I II I II I II M I II II II I II I I I II I 
931 AAATGATCCTAAGTGG AGTGATGGGAGTCGTGATGAAGTCTTAACTGGA T 980 

balA 
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l»enetic mirktn of GBS genotypes 


Strain No. (%) 


Genotypes 


V-aIp3-ISJJ«-ISSa/* 


2(1)" 


V.? 


V-alp3-ISJJW-IS^i-ISSa< 


1 (03) 


V.l 
vo 


V-alp3-ISJJM 


27+1(14) 


V-l 


V-alp3 


1 f03) 


V-7 


V-alp3-lSi5W-IS5tf7-GBSa 


1 703) 




Ib-alp3-ISJ5*/ 


I (03) 


Ib-14 


V-None-GBSil 


1(03) 


V-6 


ni-alp3-ISZ?M-GBSil 




jji-4-2 


n-alpS-ISJiW-IS/JW-GBSil 


1(0. J) 


D-10 


n-alp3-IS73o7-IS7*tf 


t(fc5) 


rj.8 




KOS) 


n-9 


n-a-ISiiW-ISSa^-GBSfl 


2(1) 


tu 


Vl-as-ISJJW 


1(0.5) 


VI-4 


Vl-a-IS/JW 


1 (0.5) 


VI-5 


TV-+-1S1381 




IV-l 


TV-»-rSJ381-1S861 


1 (03) 


IV-2 




3(2) 


Ia-2 


Ia-a-IS7J«-IS<?67 


3(2) 


Ia-3 


Ia-a-IS75M 


34+1 (18) 


Ia-1 


Ia-None-IS/ttZ 


1 (03) 


Ia-8 


Ia-a-ISi5M-ISSa4 


WO 5) 


la-4 
iO t 






4a-0 


la- a-T <5;?a" /.frUSfl 


1 /A CV 

1(03) 






2(1) 


Jl-2 


Yl A aTSO TOUT TOO«^ 


f /A *\ 

1(0 J) 


H-7 


li-Aai>4-lo./ J 0i Oi -1SCKW 


1(0 J) 


11-5 


TT-AoHI 77JP7 TCJ?*7 TC7UA 
11-AUI 1-JQu J a 1-mSki oi -loi J yo 


2(1) 


IT 1 

11-3 


ll-AM-U J J 0 1 -1M 0 J - JO I JVO-UDOU 


14.1 fl\ 
1+1 {11 


XT 10 
11-1* 


n-AaB4-IS73M-IS567-IS7 54tf 


1 (03) 


H-6 


VI-AaB3b-ISiJtf7-IS*d7 




VI-2 


VI-Aa-ISiSW-IS^tfi-GBSfl 


1 703) 


VI-3 


VI-AaB7a-IS7JW-JSWi 


W0 5) 


VI-1 


IVAaOJ (rUIJvi 


1 f0 5) 


ruji 


Ib-AaB7-ISiJ*2 


1(03) 


Ib-9 


Dt>-AaB10-IS75S74S*d7 


1+1 (1) 


lb- 13 


Ib-AaB 10-IS73&MS<S67-ISSa«/ 


1 (0.5) 


lb- 12 


Ib-AaBl-ISJJW-ISWJ 


13+2(8) 


Ib-1 


Ib-AaB 1-IS7357 


1(03) 


Ib-2 


Ib-AaB3-ISi 381-1S86 J 


2(1) 


Ib-5 


Ib-AaB4b-IS7J*J-IS*o7 


1+1 (1) 


Ib-8 


Ib-AaB %~1SJ381-IS861 


1 (03) 


Ib-10 


Ib-AaB4-IS7J£7-IS*67 


1 (03) 


ib-7 


Ib-AaB 1 a-ISl.?#7-IS£d7 




Ib-3 


Ib-AaB2-ISi5mS5tf/ 


1 (03) 


JM 


Ib-AaBNi-IS/5*7-lS*67 


1(03) 


Ib-15 


Ib-AaBN2-ISJ5o7-ISSd7 


1(03) 


Ib-16 


Ib-AaB9a-IS75*7-IStf6'i 


1(03) 


Ib-11 


m-4-AaB2.ISii5i-IS^7-GBSil 


1(03) 


m-4-i 


IH-R-IS567-GBS0 


13+4(9) 


ffl-2 


n-R-IS/35/-ISS67-GBSH 


4(2) 


n-i 


IV-R-IS75a7-lS*67-GBSa 


1(03) 


IV-3 




22+4(13) 


in-i 


V-RB3a-ISiJW-ISMJ 


1(03) 


V-5 


ni-alpZas 


3+1(2) 


m-3 


Ia-alp2as 


1(03) 


Ia-7 


n-aIp2as-GBSa 


1(0.5) 


n-u 


TotsJ»56 genotypes 


TotaI=177+17 


56 genotypes 



5 

H — 



10 
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